CN113066025B - Image defogging method based on incremental learning and feature and attention transfer - Google Patents

Image defogging method based on incremental learning and feature and attention transfer Download PDF

Info

Publication number
CN113066025B
CN113066025B CN202110304663.2A CN202110304663A CN113066025B CN 113066025 B CN113066025 B CN 113066025B CN 202110304663 A CN202110304663 A CN 202110304663A CN 113066025 B CN113066025 B CN 113066025B
Authority
CN
China
Prior art keywords
network
layer
feature
convolution
defogging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110304663.2A
Other languages
Chinese (zh)
Other versions
CN113066025A (en
Inventor
王科平
李冰锋
韦金阳
杨艺
李新伟
崔立志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Technology
Original Assignee
Henan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Technology filed Critical Henan University of Technology
Priority to CN202110304663.2A priority Critical patent/CN113066025B/en
Publication of CN113066025A publication Critical patent/CN113066025A/en
Application granted granted Critical
Publication of CN113066025B publication Critical patent/CN113066025B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration

Abstract

The invention discloses an image defogging method based on incremental learning and feature and attention transfer, which comprises the following steps of: s1, constructing a self-encoder network serving as a teacher network, and extracting a first intermediate layer feature diagram and a first feature attention diagram; s2, constructing a defogging network serving as a student network, outputting a second intermediate layer characteristic diagram and a second characteristic attention diagram to fit the first intermediate layer characteristic diagram and the first characteristic attention diagram, and performing enhancement operation on corresponding characteristics of a third characteristic attention diagram obtained after fitting; s3, training the teacher network by using a plurality of groups of paired same images; s4, carrying out optimization training on the student network by using a plurality of groups of paired fog images and clear images; s5, training a student network by using the combined action of an SSIM loss function and a Smooth L1 loss function; and S6, performing incremental operation on the data set in the student network, and improving the defogging capacity of the defogging network on other data.

Description

Image defogging method based on incremental learning and feature and attention transfer
Technical Field
The invention relates to the field of image processing, in particular to an image defogging method based on incremental learning and feature and attention transfer.
Background
In recent years, the air quality is deteriorated and the haze weather is gradually increased due to industrial production, automobile emission and other reasons, the problems of low contrast, color distortion, blurring and the like of a foggy image acquired by imaging equipment are caused by absorption and scattering of light rays by suspended particles in the air, the visual effect of the image is directly influenced by the haze image, and a high-level computer vision task taking the image as a processing object is limited, so that the haze image sharpening research has important significance in the field of computer vision.
An image restoration method based on an atmospheric scattering model and an image defogging method based on deep learning are the mainstream methods at present. However, the image restoration method based on the atmospheric scattering model has the problems of defogging residue, image distortion and the like caused by inaccurate estimation of intermediate parameters; the image defogging method based on the deep learning has the problem of weak generalization capability due to the limitation of a data set; therefore, there is a need to develop a method for improving the defogging and generalization ability in the defogging network to solve the above problems.
Disclosure of Invention
The invention aims to solve the problems and provides an image defogging method based on incremental learning, characteristics and attention transfer, which is simple to operate and capable of improving a defogging effect.
In order to realize the purpose, the technical scheme of the invention is as follows:
an image defogging method based on incremental learning and feature and attention transfer comprises the following steps:
s1, constructing a self-encoder network serving as a teacher network, and extracting first intermediate layer feature graphs and first feature attention graphs of different dimensions in the self-encoder network for subsequent training of a student network;
s2, constructing a defogging network serving as a student network, wherein the defogging network consists of a residual block and two layers of convolutions, and outputting a second intermediate layer characteristic diagram with different dimensions and a first intermediate layer characteristic diagram extracted from a coder network by using a Smooth L1 loss function to constrain the residual block, and fitting a first characteristic attention diagram and a first intermediate layer characteristic diagram by using a third characteristic attention diagram obtained after fitting as weights to perform enhancement operation on corresponding characteristics;
s3, using a plurality of groups of same images in pairs as input and labels of a teacher network to train the teacher network;
s4, using a plurality of groups of paired fog images and clear images as input and labels of the student network to carry out optimization training on the student network;
s5, using a Smooth L1 loss function as a loss function between labels and defogging results in a teacher network and a student network, simultaneously using an SSIM loss function as a loss function between a first middle layer characteristic diagram and a second middle layer characteristic diagram, using the Smooth L1 loss function as a loss function between a first attention diagram and a second attention diagram, and training the student network under the joint action of the SSIM loss function and the Smooth L1 loss function;
and S6, performing incremental operation on the data set in the student network, and improving the defogging capacity of the defogging network on other data.
Further, the self-encoder network in the step S1 is composed of a convolution module and an up-sampling module; the convolution module comprises four layers of convolution operation; the first layer of convolution uses 64 convolution kernels of 3x3, and performs convolution operation with step size of 2 and pad of 1, i.e. f 1 =3,c 1 =64, then the first layer convolution may be denoted as Conv 1 (3, 64, 3); the second layer convolution uses 128 convolution kernels of 3x3, with step size 1 and pad 1, i.e. f 2 =3,c 2 =128, second layer convolution may be denoted as Conv 2 (64, 128, 3); the third layer of convolution uses 256 convolution kernels at 3x3, and performs the convolution operation with step size of 2 and pad of 1, i.e. f 3 =3,c 3 =256, the convolution of the third layer can be represented as Conv 3 (128,256,3); the fourth layer of convolution uses 512 convolution kernels of 3x3, and performs convolution operation with step size of 1 and pad of 1, i.e. f 4 =3,c 4 =512, fourth layer convolution may be expressed as Conv 4 (256,512,3);
The upper partThe sampling module corresponds to the convolution module, and the up-sampling module comprises four layers of deconvolution operations; the first layer of deconvolution uses 256 4x4 convolution kernels, and upsamples with step size of 2 and pad of 1, i.e. f 1 '=4, c 1 ' =256, first layer deconvolution denoted as TranConv 1 (512, 256, 4); the second layer of deconvolution adopts 128 1x1 convolution kernels, and upsampling is carried out by adopting step length of 1 and pad of 0, namely f 2 '=1,c' 2 =128, second layer deconvolution denoted as TranConv 2 (256, 128, 1); the third layer of deconvolution adopts 64 4x4 convolution kernels, and performs upsampling by using step length of 2 and pad of 1, namely f 3 '=4,c' 3 =64, third layer deconvolution as TranConv 3 (128, 64, 4); the fourth layer of deconvolution adopts 3 1x1 convolution kernels, and upsampling is carried out by adopting the step length of 1 and the pad of 0, namely f 4 '=1, c' 4 =3, fourth layer deconvolution denoted as TranConv 4 (64,3,1)。
Further, the residual block in step S2 adopts two 3 × 3 convolution layers, the pad is 1, the step is 1, the input dimension and the output dimension are kept unchanged, that is, each layer of residual block is in the Conv-ReLU-Add format, and in addition, a layer of convolution kernel with a convolution kernel of 3 × 3, the step is 2, and the pad is 1 is added before the first layer and the third layer of residual block to perform downsampling operation.
Further, in the step S2, a third feature attention is input into the student network after the feature enhancement operation is performed on the feature in the feature enhancement module.
Further, in step S5, a Smooth L1 loss is used as a loss function between the output and the tag and between the first feature attention diagram and the second feature attention diagram, the Smooth L1 loss function is obtained by improving on the basis of an L1 norm loss function, and a mathematical formula of the L1 norm loss function is as follows:
Figure BDA0002987602170000031
wherein J is a label, and the label is a label,
Figure BDA0002987602170000032
n is the number of samples as the network estimation result;
the mathematical formula for the Smooth L1 loss function is:
Figure BDA0002987602170000041
wherein the content of the first and second substances,
Figure BDA0002987602170000042
further, the mathematical formula of the SSIM loss function in step S5 is:
Figure BDA0002987602170000043
wherein x is the intermediate characteristic of the foggy image of the student network learning,
Figure BDA0002987602170000044
intermediate feature, mu, of fog-free image output for teacher's network x
Figure BDA0002987602170000045
Respectively the average value of the second intermediate layer characteristic diagram and the first intermediate layer characteristic diagram,
Figure BDA0002987602170000046
the variances of the second intermediate layer characteristic diagram and the first intermediate layer characteristic diagram are respectively,
Figure BDA0002987602170000047
the covariance of the second interlayer feature map and the first interlayer feature map; c. C 1 =(k 1 L) 2 、c 1 =(k 2 L) 2 Is a constant number, k 1 Is 0.01,k 2 Is 0.03 and L is the pixel value dynamic range of the image.
Further, the incremental operation on the data set in the student network in step S6 includes the following steps:
s61, selecting an indoor fog image data set to train an encoder network;
s62, inputting the indoor fog image data set serving as a training data set into a defogging network, and simultaneously inputting a clear image corresponding to each fog image in the indoor fog image data set into an encoder network;
and S63, on the basis of the parameters of the defogging network in the step S62, reserving part of the indoor fog images, and adding part of the outdoor fog images as a training data set to retrain the defogging network.
Compared with the prior art, the invention has the advantages and positive effects that:
the invention provides an image defogging method based on incremental learning and feature and attention transfer, which can effectively improve the defogging and generalization capabilities of a defogging network; a double-network model is adopted on a network structure, a self-encoder is used as a teacher network, a middle-layer characteristic diagram and a characteristic attention diagram of a fog-free image are extracted to increase the constraint of a loss function and guide the learning of a defogging network (a student network), an incremental learning method idea is adopted on a training mode, the defogging network is trained by using an indoor fog diagram data set, a small sample data set including an indoor fog diagram and an outdoor fog diagram is used after the training is finished, the network is retrained, the forgetting of the defogging network to the original knowledge is reduced under the combined action of the guidance of the teacher network and the retention of a small number of data sets, and the defogging effect of the image is improved.
The invention has stronger defogging capability on indoor image data, only needs to take a small amount of image data to perform incremental learning on the network when the defogging effect of the network on an outdoor image data set needs to be improved, does not need to retrain the network by a large amount of data, saves a large amount of time, has good effect on processing two data sets, and has better performance compared with other advanced defogging methods.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a general block diagram of a network of the present invention;
fig. 2 is a schematic structural diagram of a feature enhancement module.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived from the embodiments of the present invention by a person skilled in the art without any creative effort, should be included in the protection scope of the present invention.
The invention provides a double-network defogging method combining attention, incremental learning and other methods, which comprises the following steps of:
(1) And constructing a self-encoder network serving as a teacher network, reconstructing clear images, extracting intermediate layer feature maps and feature attention maps of different dimensions of the network, and training subsequent student networks. The teacher network comprises an up-sampling module and a down-sampling module.
(2) And constructing a defogging network serving as a student network for clarifying a fog image, wherein the network consists of residual blocks formed by jump connection, constraining feature images of different dimensions output by the residual blocks by using a Smooth L1 loss function, fitting a teacher network feature image and an attention image drawing, and taking the fitted attention image drawing as weight to enhance corresponding features. And the dimension of the output characteristic diagram of each layer of the student network corresponds to the teacher network.
(3) The teacher network is trained using multiple sets of pairs of identical images as teacher network inputs and labels.
(4) And (4) using multiple groups of paired fog images and clear images as input and labels of the student network to optimize and train the student network.
(5) And measuring the difference between the label and the network defogging result by using a Smooth L1 norm as a loss function between the label and the network defogging result in the teacher network and the student network, and further using an SSIM loss function as a loss function of a characteristic diagram between the teacher network and the student network, and using the Smooth L1 loss function as a loss function of the first attention diagram and the second attention diagram, wherein the two loss functions jointly act to train the student network.
(6) The student network fits the intermediate characteristics of the teacher network as much as possible, and the characteristics are enhanced by attention, so that the characteristic extraction capability of the student network is improved, and the defogging capability is further enhanced.
(7) The student network data set is increased, the defogging capacity of the network on other data is improved, and the generalization capacity of the network is enhanced.
In step (1), the self-encoder is composed of four layers of convolution and up-sampling modules, the first layer of convolution uses 64 convolution kernels of 3x3, the step size is 2, and the pad is 1 to perform convolution operation, namely f 1 =3,c 1 =64, then convolutional layer can be represented as Conv 1 (3, 64, 3); the second layer convolution uses 128 convolution kernels of 3x3, with step size 1 and pad 1, i.e. f 2 =3,c 2 =128, convolutional layer may be referred to as Conv 2 (64, 128, 3); the third layer of convolution uses 256 convolution kernels of 3x3, with step size of 2 and pad of 1, and performs the convolution operation, i.e. f 3 =3,c 3 =256, the convolution layer can be represented as Conv 3 (128, 256, 3); the fourth layer of convolution uses 512 convolution kernels of 3x3, and performs convolution operation with step size of 1 and pad of 1, i.e. f 4 =3,c 4 =512, convolution layer may be denoted Conv 4 (256,512,3). In order to prevent the grid effect of the image restored by deconvolution when the characteristic diagram is too small, the invention only carries out downsampling twice, expands channels simultaneously, is beneficial to network training, prevents information loss, and activates output by using the ReLU after each layer of convolution, thereby increasing the nonlinearity of the network. The up-sampling operation corresponds to the convolution module, the four deconvolution layers are utilized to restore the original size of the image, and meanwhile, the number of channels is also restored to the original state, and the specific operation is as follows: first layer is inverseThe convolution uses 256 4 × 4 convolution kernels, with step size of 2 and pad of 1, to perform one up-sampling, i.e. f 1 '=4,c 1 ' =256, reduce 512 channels to 256, deconvolution layer is denoted TranConv 1 (512, 256, 4), the second layer of deconvolution uses 128 1x1 convolution kernels with step size of 1, pad of 0, without changing the feature map size, i.e., f 2 '=1,c' 2 =128, deconvolution layer denoted TranConv 2 (256, 128, 1); the third layer of deconvolution uses 64 4x4 convolution kernels, uses step size of 2 and pad of 1, and performs upsampling to double the feature map as in the first layer, i.e., f 3 '=4,c' 3 =64, deconvolution denoted TranConv 3 (128, 64, 4); the fourth layer of deconvolution adopts 3 1x1 convolution kernels, the step length is 1, the pad is 0, and the size of the characteristic diagram is not changed, namely f 4 '=1,c' 4 =3, deconvolution as TranConv 4 (64,3,1)。
In the step (2), the defogging network adopts residual blocks to form a network backbone, and the 'identity mapping' reduces the loss of characteristic information in the characteristic extraction process, retains more information, is beneficial to the training of the network and prevents the problem of gradient 'explosion'. The defogging network uses four residual blocks, performs feature extraction with two convolution layers for downsampling, and in order to ensure that the feature diagram of each layer of the defogging network has the same dimension as the feature diagram of a teacher network, the residual blocks adopt two 3x3 convolution layers, the pad is 1, the step length is 1, the input dimension and the output dimension are kept unchanged, namely each layer is in Conv-ReLU-Conv-ReLU-Add format, no BN layer exists, the experiment shows that the BN layer can cause the color distortion phenomenon of an image, and then, a convolution with the convolution kernel of 3x3 and the step length of 2 and the pad of 1 is respectively added in front of the first layer and the third layer of residual blocks to perform downsampling operation. In addition, the characteristics of each layer of the teacher network and the student network are input into a characteristic enhancement module (FE), and the characteristics are enhanced and then input into the next layer of the student network. And finally, performing up-sampling operation on the feature graph to recover a clear fog-free image, wherein the up-sampling operation is the same as that of a teacher network.
The self-encoder structure adopts an encoding-decoding structure to reconstruct images, and the self-encoder is selected as a teacher to learn the mapping from the original images to the original images through network learning. Based on the idea that a fog-free clear image serving as an input extracted feature is more representative and more suitable for recovering a fog-free image compared with a fog-free image, the method takes a self-encoder which is well trained as a teacher network, extracts a fog-free image feature map and an attention map which are obtained from each middle layer of the network, calculates a loss function with a corresponding feature map and the attention map which are obtained from a defogging network (student network), and fits the feature map extracted by the defogging network to the teacher network feature map. In addition, the method transforms the feature graph through the Sigmod function to obtain an attention map, and due to the characteristics of the Sigmod function, the value obtained by transforming the important feature pixel points is larger, namely the weight is larger, so that the neural network puts more attention on the important features. The Sigmod function maps the feature values to (0, 1) and multiplies the feature values correspondingly with the original feature graph as attention weights, so that the obtained feature values are gradually reduced and are not beneficial to network training, aiming at the problem, an identity mapping is added in the invention, the original feature graph and the processed feature graph are subjected to element addition, so that the function of feature enhancement is achieved, the feature enhancement is realized by an FE (feature enhancement) module in the graph 1, the invention selects feature graphs extracted by different convolution layers for feature enhancement, so that features extracted by a defogging network are more comprehensively fitted to teacher network features, and an FE structural diagram is shown in a graph 2.
In the step (5), a Smooth L1 loss function is used for measuring the difference between the clear image output by the network and the real clear image, and the training of the network is realized by minimizing the loss function. The Smooth L1 loss function is obtained by improving an L1 norm loss function, and the mathematical formula of the L1 norm loss can be expressed as:
Figure BDA0002987602170000081
wherein J is a label, and the label is a label,
Figure BDA0002987602170000091
for the network estimation result, N is the number of samples. The L1 loss function has good robustness, but its central point is a break point, which is not Smooth, resulting in unstable solution, and for this problem, the scholars propose that the Smooth L1 loss function improves the L1 loss, and its mathematical formula can be expressed as:
Figure BDA0002987602170000092
wherein the content of the first and second substances,
Figure BDA0002987602170000093
in step (6), besides calculating the Loss Loss1 of the fog map and the fog-free map, the intermediate output of the teacher network is used as a soft label of the student network, the Loss LOSS _ F between the feature maps of each layer and the Loss LOSS _ A between the attention maps are added, smooth L1 Loss is used as a Loss function between the foggy image and the estimated fogless image and between two network intermediate attention maps, and Structural Similarity Index (SSIM) is used as a Loss function between the intermediate characteristic maps output by the two networks. The structural similarity loss is used for measuring the structural similarity between two images, the structural similarity is compared from three aspects of brightness, contrast and structure, the evaluation standard of SSIM is similar to the visual system of human, the sensing of local structural change is sensitive, the detail processing is more perfect, and the network performance is greatly improved due to the constraint of a multi-loss function. The SSIM mathematical expression is:
Figure BDA0002987602170000094
wherein x is the intermediate characteristic of the fog map of the student network learning,
Figure BDA0002987602170000095
intermediate fog-free map feature, mu, output for teacher's network x
Figure BDA0002987602170000096
Are the average values of the characteristic maps respectively,
Figure BDA0002987602170000097
respectively, the variance of the feature map is obtained,
Figure BDA0002987602170000098
is the feature map covariance. c. C 1 =(k 1 L) 2 、c 1 =(k 2 L) 2 Is a constant number, k 1 、k 2 The dynamic ranges of the pixel values of the images are 0.01 and 0.03 by default, and the value of the method is 255.
In step (7), in order to enhance the network generalization capability, an incremental learning manner is adopted in the network learning part, and the structure is shown in fig. 1 (right). The network training is divided into three steps: the first step is as follows: training a self-encoder (teacher network), and selecting an indoor fog image as a data set, so that the teacher network has good indoor image reconstruction and feature extraction capability; the second step is that: the defogging network (student network) adopts an indoor fog image data set, the fog image is input into the defogging network, meanwhile, a clear image corresponding to the fog image is used as the input of the self-encoder, and the defogging network is trained to have the capability of removing indoor image haze; the third step: and on the basis of the parameters of the defogging network in the second step, reserving a small amount of indoor fog maps, and adding a small amount of outdoor fog maps as a data set to retrain the network. The incremental learning has the defects that old knowledge is forgotten, and when the network learns new knowledge, part of the existing knowledge is forgotten, but the teacher network provided by the invention not only reduces the forgetting of the student network to the existing knowledge, but also improves the effect of the network to the new knowledge.
The results of the experiments were compared on ITS data sets, and objective evaluation index pairs are shown in table 1.
TABLE 1 Objective evaluation index comparison
Figure 1
The results were compared on the OTS data sets, and the objective evaluation index pairs are shown in table 2.
TABLE 2 Objective evaluation index
Figure BDA0002987602170000102
As is apparent from tables 1 and 2, the technical scheme of the invention can effectively improve the defogging and generalization capability of the defogging network; the invention adopts a double-network model on a network structure, a self-encoder is used as a teacher network, a middle layer characteristic diagram and a characteristic attention diagram of a fog-free image are extracted to increase the constraint of a loss function and guide the learning of a defogging network (student network), an incremental learning method idea is adopted on a training mode, firstly, an indoor fog diagram data set is used for training the defogging network, a small sample data set is used after the training is finished, the network is retrained, the forgetting of the defogging network on the original knowledge is reduced under the coaction of the teacher network and the reservation of a small number of data sets, and the defogging effect of the image is improved.
The invention has stronger defogging capability on indoor image data, only needs to take a small amount of image data to perform incremental learning on the network when the defogging effect of the network on an outdoor image data set needs to be improved, does not need to retrain the network by a large amount of data, saves a large amount of time, has good effect on processing two data sets, and has better performance compared with other advanced defogging methods.

Claims (7)

1. An image defogging method based on increment learning and feature and attention transfer is characterized in that: the method comprises the following steps:
s1, constructing a self-encoder network serving as a teacher network, and extracting first intermediate layer feature diagrams and first feature attention diagrams with different dimensions in the self-encoder network for subsequent training of a student network;
s2, constructing a defogging network serving as a student network, wherein the defogging network consists of a residual block and two layers of convolutions, and outputting a second intermediate layer characteristic diagram with different dimensions and a second characteristic attention diagram to fit a first intermediate layer characteristic diagram and a first characteristic attention diagram extracted from a coder network by using a Smooth L1 loss function constraint residual block, and performing enhancement operation on corresponding characteristics by using a third characteristic attention diagram obtained by converting the fitted characteristics as weight;
s3, using a plurality of groups of paired same clear images as the input and the label of the teacher network to train the teacher network;
s4, using a plurality of groups of paired fog images and clear images as input and labels of the student network to carry out optimization training on the student network;
s5, a Smooth L1 loss function is used as a loss function between labels and defogging results in the teacher network and the student network, an SSIM loss function is used as a loss function between a first middle layer characteristic diagram and a second middle layer characteristic diagram, the Smooth L1 loss function is used as a loss function between a first attention diagram and a second attention diagram, and the SSIM loss function and the Smooth L1 loss function jointly act to train the student network;
and S6, performing incremental operation on the data set in the student network, and improving the defogging capacity of the defogging network on other data.
2. The image defogging method based on incremental learning and feature and attention transfer as claimed in claim 1, wherein: the self-encoder network in the step S1 consists of a convolution module and an up-sampling module; the convolution module comprises four layers of convolution operation; the first layer of convolution uses 64 convolution kernels of 3x3, and performs convolution operation with step size of 2 and pad of 1, i.e. f 1 =3,c 1 =64, then the first layer convolution may be denoted as Conv 1 (3, 64, 3); the second layer convolution uses 128 convolution kernels of 3x3, with step size 1 and pad 1, i.e. f 2 =3,c 2 =128, second layer convolution may be denoted as Conv 2 (64, 128, 3); the third layer of convolution uses 256 convolution kernels at 3x3, and performs the convolution operation with step size of 2 and pad of 1, i.e. f 3 =3,c 3 =256, the convolution of the third layer can be represented as Conv 3 (128, 256, 3); the fourth convolution uses 512 convolution kernels of 3x3, performs the convolution operation with step size 1 and pad 1,i.e. f 4 =3,c 4 =512, fourth layer convolution may be expressed as Conv 4 (256,512,3);
The up-sampling module corresponds to the convolution module and comprises four layers of deconvolution operations; the first layer of deconvolution uses 256 4x4 convolution kernels, upsampling is carried out by using step size of 2 and pad of 1, namely f' 1 =4,c′ 1 =256, first layer deconvolution as TranConv 1 (512, 256, 4); the second layer of deconvolution uses 128 convolution kernels of 1x1, upsampling with step size 1 and pad 0, i.e. f' 2 =1,c′ 2 =128, second layer deconvolution as TranConv 2 (256, 128, 1); the third layer of deconvolution uses 64 4x4 convolution kernels, upsampled with step size of 2 and pad of 1, i.e., f' 3 =4,c′ 3 =64, third layer deconvolution as TranConv 3 (128, 64, 4); the fourth layer of deconvolution uses 3 convolution kernels of 1x1, upsampling with step size 1 and pad 0, i.e. f' 4 =1,c′ 4 =3, fourth layer deconvolution as TranConv 4 (64,3,1)。
3. The image defogging method based on incremental learning and feature and attention transfer as claimed in claim 2, wherein: the residual block in step S2 adopts two 3 × 3 convolution layers, where pad is 1 and step length is 1, and input dimension and output dimension are kept unchanged, that is, each layer of residual block is in Conv-ReLU-Add format, and in addition, a layer of convolution kernel with 3 × 3 and step length of 2 and pad is 1 is added before the first layer and the third layer of residual block respectively to perform downsampling operation.
4. The image defogging method based on incremental learning and feature and attention transfer as claimed in claim 3, wherein: in the step S2, the third feature attention is input into the student network after the feature enhancement operation is performed on the feature in the feature enhancement module.
5. The image defogging method based on incremental learning and feature and attention transfer according to claim 4, wherein: in the step S5, a Smooth L1 loss is used as a loss function between the output and the tag and between the first feature attention diagram and the second feature attention diagram, the Smooth L1 loss function is obtained by improving on the basis of an L1 norm loss function, and a mathematical formula of the L1 norm loss function is as follows:
Figure FDA0003796591380000031
wherein J is a label, and J is a label,
Figure FDA0003796591380000032
n is the number of samples as the network estimation result;
the mathematical formula for the Smooth L1 loss function is:
Figure FDA0003796591380000033
wherein the content of the first and second substances,
Figure FDA0003796591380000034
6. the image defogging method based on incremental learning and feature and attention transfer as claimed in claim 5, wherein: the mathematical formula of the SSIM loss function in step S5 is:
Figure FDA0003796591380000035
wherein x is the intermediate characteristic of the fog image of the student network learning,
Figure FDA0003796591380000036
intermediate feature, mu, of fog-free image output for teacher's network x
Figure FDA0003796591380000037
Respectively are the average values of the characteristic diagram of the second intermediate layer and the characteristic diagram of the first intermediate layer,
Figure FDA0003796591380000038
the variances of the second intermediate layer characteristic diagram and the first intermediate layer characteristic diagram are respectively,
Figure FDA0003796591380000039
the covariance of the second interlayer feature map and the first interlayer feature map; c. C 1 =(k 1 L) 2 、c 1 =(k 2 L) 2 Is a constant number, k 1 Is 0.01,k 2 Is 0.03 and l is the pixel value dynamic range of the image.
7. The image defogging method based on incremental learning and feature and attention transfer as claimed in claim 6, wherein: the step S6 of performing increment operation on the data set in the student network comprises the following steps:
s61, selecting an indoor fog image data set to train an encoder network;
s62, inputting the indoor fog image data set serving as a training data set into a defogging network, and simultaneously inputting a clear image corresponding to each fog image in the indoor fog image data set into an encoder network;
and S63, on the basis of the parameters of the defogging network in the step S62, reserving part of the indoor fog images, and adding part of the outdoor fog images as a training data set to retrain the defogging network.
CN202110304663.2A 2021-03-23 2021-03-23 Image defogging method based on incremental learning and feature and attention transfer Active CN113066025B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110304663.2A CN113066025B (en) 2021-03-23 2021-03-23 Image defogging method based on incremental learning and feature and attention transfer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110304663.2A CN113066025B (en) 2021-03-23 2021-03-23 Image defogging method based on incremental learning and feature and attention transfer

Publications (2)

Publication Number Publication Date
CN113066025A CN113066025A (en) 2021-07-02
CN113066025B true CN113066025B (en) 2022-11-18

Family

ID=76562797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110304663.2A Active CN113066025B (en) 2021-03-23 2021-03-23 Image defogging method based on incremental learning and feature and attention transfer

Country Status (1)

Country Link
CN (1) CN113066025B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592830A (en) * 2021-08-04 2021-11-02 航天信息股份有限公司 Image defect detection method and device and storage medium
CN113592742A (en) * 2021-08-09 2021-11-02 天津大学 Method for removing image moire
CN114004315A (en) * 2021-12-31 2022-02-01 北京泰迪熊移动科技有限公司 Method and device for incremental learning based on small sample

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9177363B1 (en) * 2014-09-02 2015-11-03 National Taipei University Of Technology Method and image processing apparatus for image visibility restoration
CN111598793A (en) * 2020-04-24 2020-08-28 云南电网有限责任公司电力科学研究院 Method and system for defogging image of power transmission line and storage medium
CN111681178A (en) * 2020-05-22 2020-09-18 厦门大学 Knowledge distillation-based image defogging method
CN112184577A (en) * 2020-09-17 2021-01-05 西安理工大学 Single image defogging method based on multi-scale self-attention generation countermeasure network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9177363B1 (en) * 2014-09-02 2015-11-03 National Taipei University Of Technology Method and image processing apparatus for image visibility restoration
CN111598793A (en) * 2020-04-24 2020-08-28 云南电网有限责任公司电力科学研究院 Method and system for defogging image of power transmission line and storage medium
CN111681178A (en) * 2020-05-22 2020-09-18 厦门大学 Knowledge distillation-based image defogging method
CN112184577A (en) * 2020-09-17 2021-01-05 西安理工大学 Single image defogging method based on multi-scale self-attention generation countermeasure network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Distilling Image Dehazing With Heterogeneous Task Imitation;Ming Hong et al.;《2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)》;20200805;第1-10页 *
Knowledge Transfer Dehazing Network for NonHomogeneous Dehazing;Haiyan Wu et al.;《2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)》;20200728;第1-9页 *
Uneven Image Dehazing by Heterogeneous Twin Network;KEPING WANG et al.;《IEEE Access》;20200707;第8卷;第118485-118496页 *
基于深度学习的单幅图像去雾算法研究;赵银湖;《中国优秀硕士学位论文全文数据库信息科技辑》;20210215(第2期);第I138-1768页 *

Also Published As

Publication number Publication date
CN113066025A (en) 2021-07-02

Similar Documents

Publication Publication Date Title
CN110738697B (en) Monocular depth estimation method based on deep learning
CN113066025B (en) Image defogging method based on incremental learning and feature and attention transfer
CN108492271B (en) Automatic image enhancement system and method fusing multi-scale information
CN109299274B (en) Natural scene text detection method based on full convolution neural network
CN108921799B (en) Remote sensing image thin cloud removing method based on multi-scale collaborative learning convolutional neural network
CN110020989B (en) Depth image super-resolution reconstruction method based on deep learning
CN111915530B (en) End-to-end-based haze concentration self-adaptive neural network image defogging method
CN112184577B (en) Single image defogging method based on multiscale self-attention generation countermeasure network
CN109035251B (en) Image contour detection method based on multi-scale feature decoding
CN109035172B (en) Non-local mean ultrasonic image denoising method based on deep learning
CN111222519B (en) Construction method, method and device of hierarchical colored drawing manuscript line extraction model
CN114048822A (en) Attention mechanism feature fusion segmentation method for image
CN111931857B (en) MSCFF-based low-illumination target detection method
CN110807744B (en) Image defogging method based on convolutional neural network
CN114936605A (en) Knowledge distillation-based neural network training method, device and storage medium
CN116311254B (en) Image target detection method, system and equipment under severe weather condition
CN111127354A (en) Single-image rain removing method based on multi-scale dictionary learning
CN111402138A (en) Image super-resolution reconstruction method of supervised convolutional neural network based on multi-scale feature extraction fusion
CN110738660A (en) Spine CT image segmentation method and device based on improved U-net
CN114638768B (en) Image rain removing method, system and equipment based on dynamic association learning network
WO2023212997A1 (en) Knowledge distillation based neural network training method, device, and storage medium
CN112686830B (en) Super-resolution method of single depth map based on image decomposition
CN116452469B (en) Image defogging processing method and device based on deep learning
CN111612803B (en) Vehicle image semantic segmentation method based on image definition
CN104123707B (en) Local rank priori based single-image super-resolution reconstruction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant