CN113066025A - Image defogging method based on incremental learning and feature and attention transfer - Google Patents

Image defogging method based on incremental learning and feature and attention transfer Download PDF

Info

Publication number
CN113066025A
CN113066025A CN202110304663.2A CN202110304663A CN113066025A CN 113066025 A CN113066025 A CN 113066025A CN 202110304663 A CN202110304663 A CN 202110304663A CN 113066025 A CN113066025 A CN 113066025A
Authority
CN
China
Prior art keywords
network
layer
feature
convolution
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110304663.2A
Other languages
Chinese (zh)
Other versions
CN113066025B (en
Inventor
王科平
李冰锋
韦金阳
杨艺
李新伟
崔立志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Technology
Original Assignee
Henan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Technology filed Critical Henan University of Technology
Priority to CN202110304663.2A priority Critical patent/CN113066025B/en
Publication of CN113066025A publication Critical patent/CN113066025A/en
Application granted granted Critical
Publication of CN113066025B publication Critical patent/CN113066025B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration

Abstract

The invention discloses an image defogging method based on incremental learning and feature and attention transfer, which comprises the following steps of: s1, constructing a self-encoder network serving as a teacher network, and extracting a first intermediate layer feature diagram and a first feature attention diagram; s2, constructing a defogging network serving as a student network, outputting a second intermediate layer feature map and a second feature attention map to fit the first intermediate layer feature map and the first feature attention map, and performing enhancement operation on corresponding features by using a third feature attention map obtained after fitting; s3, training the teacher network by using multiple groups of paired same images; s4, performing optimization training on the student network by using multiple groups of paired fog images and clear images; s5, training a student network by using the combined action of an SSIM loss function and a smoothen L1 loss function; and S6, performing incremental operation on the data set in the student network, and improving the defogging capacity of the defogging network on other data.

Description

Image defogging method based on incremental learning and feature and attention transfer
Technical Field
The invention relates to the field of image processing, in particular to an image defogging method based on incremental learning and feature and attention transfer.
Background
In recent years, the air quality is deteriorated and the haze weather is gradually increased due to industrial production, automobile emission and other reasons, the problems of low contrast, color distortion, blurring and the like of a foggy image acquired by imaging equipment are caused by absorption and scattering of light rays by suspended particles in the air, the visual effect of the image is directly influenced by the haze image, and a high-level computer vision task taking the image as a processing object is limited, so that the haze image sharpening research has important significance in the field of computer vision.
An image restoration method based on an atmospheric scattering model and an image defogging method based on deep learning are the mainstream methods at present. However, the image restoration method based on the atmospheric scattering model has the problems of defogging residues, image distortion and the like caused by inaccurate estimation of intermediate parameters; the image defogging method based on deep learning has the problem of weak generalization capability due to the limitation of a data set; therefore, there is a need to develop a method for improving the defogging and generalization ability in the defogging network to solve the above problems.
Disclosure of Invention
The invention aims to solve the problems and provides an image defogging method based on increment learning, characteristics and attention transfer, which is simple to operate and improves the defogging effect.
In order to achieve the purpose, the technical scheme of the invention is as follows:
an image defogging method based on incremental learning and feature and attention transfer comprises the following steps:
s1, constructing a self-encoder network serving as a teacher network, and extracting first intermediate layer feature diagrams and first feature attention diagrams of different dimensions in the self-encoder network for subsequent training of the student network;
s2, constructing a defogging network serving as a student network, wherein the defogging network consists of a residual block and two layers of convolutions, a Smooth L1 loss function is used for constraining the residual block to output a second intermediate layer characteristic diagram with different dimensionalities, a second characteristic attention diagram is used for fitting a first intermediate layer characteristic diagram and a first characteristic attention diagram extracted from a coder network, and a third characteristic attention diagram obtained after fitting is used as weight to carry out enhancement operation on corresponding characteristics;
s3, using multiple groups of same images in pairs as the input and the label of the teacher network to train the teacher network;
s4, using multiple groups of paired fog images and clear images as the input and label of the student network to carry out optimization training on the student network;
s5, using a Smooth L1 loss function as a loss function between labels and defogging results in the teacher network and the student network, simultaneously using an SSIM loss function as a loss function between a first middle layer characteristic diagram and a second middle layer characteristic diagram, using a Smooth L1 loss function as a loss function between a first attention diagram and a second attention diagram, and training the student network under the combined action of the SSIM loss function and the Smooth L1 loss function;
and S6, performing incremental operation on the data set in the student network, and improving the defogging capacity of the defogging network on other data.
Further, the self-encoder network in step S1 is composed of a convolution module and an upsampling module; the convolution module comprises four layers of convolution operation; the first layer of convolution uses 64 convolution kernels of 3x3, and performs convolution operation with step size of 2 and pad of 1, i.e. f1=3,c164, the first layer convolution can be represented as Conv1(3,64, 3); the second layer convolution uses 128 convolution kernels of 3x3, and performs convolution operation with step size 1 and pad 1, i.e., f2=3,c2128, the second layer convolution may be denoted as Conv2(64,128, 3); the third layer of convolution uses 256 convolution kernels of 3x3, and performs convolution operation with step size of 2 and pad of 1, i.e. f3=3,c3256, the third layer convolution can be expressed as Conv3(128,256, 3); the fourth layer convolution uses 512 convolution kernels of 3x3, and performs convolution operation with step size of 1 and pad of 1, i.e. f4=3,c4The fourth layer convolution may be denoted as Conv 5124(256,512,3);
The up-sampling module corresponds to the convolution module and comprises four layers of deconvolution operations; the first layer of deconvolution uses 256 4x4 convolution kernels, upsampled with step size 2 and pad 1, i.e., f1'=4, c1' -256, the first layer deconvolution is denoted TranConv1(512,256, 4); the second layer of deconvolution uses 128 1 × 1 convolution kernels, upsampling with step size 1 and pad 0, i.e., f2'=1,c'2128, the second layer deconvolution is denoted as TranConv2(256,128, 1); the third layer of deconvolution uses 64 4x4 convolution kernels, upsampled with step size 2 and pad 1, i.e., f3'=4,c'364, the third layer deconvolution is denoted as TranConv3(128,64, 4); the fourth layer of deconvolution adopts 3 1x1 convolution kernels, and upsampling is carried out by adopting step size of 1 and pad of 0, namely f4'=1, c'4The fourth layer deconvolution is denoted as TranConv ═ 34(64,3,1)。
Further, the residual block in step S2 adopts two 3x3 convolution layers, the pad is 1, the step is 1, the input dimension and the output dimension are kept unchanged, that is, each layer of residual block is in the Conv-ReLU-Add format, and in addition, a layer of convolution kernel with a convolution kernel of 3x3, a step of 2, and a convolution with a pad of 1 is added before the first layer and the third layer of residual block to perform downsampling operation.
Further, the third feature attention in step S2 is input into the student network after the feature enhancement operation is performed on the feature in the feature enhancement module.
Further, in the step S5, a Smooth L1 loss is used as a loss function between the output and the tag and between the first feature attention map and the second feature attention map, the Smooth L1 loss function is obtained by improving on the basis of an L1 norm loss function, and a mathematical formula of the L1 norm loss function is as follows:
Figure BDA0002987602170000031
wherein J is a label, and J is a label,
Figure BDA0002987602170000032
n is the number of samples as the network estimation result;
the mathematical formula for the Smooth L1 loss function is:
Figure BDA0002987602170000041
wherein the content of the first and second substances,
Figure BDA0002987602170000042
further, the mathematical formula of the SSIM loss function in step S5 is as follows:
Figure BDA0002987602170000043
wherein x is the intermediate characteristic of the fog image of the student network learning,
Figure BDA0002987602170000044
intermediate features, mu, of fog-free images output for teacher's networkx
Figure BDA0002987602170000045
Respectively are the average values of the characteristic diagram of the second intermediate layer and the characteristic diagram of the first intermediate layer,
Figure BDA0002987602170000046
the variances of the second intermediate layer characteristic diagram and the first intermediate layer characteristic diagram are respectively,
Figure BDA0002987602170000047
is a co-party of the second intermediate layer characteristic diagram and the first intermediate layer characteristic diagramA difference; c. C1=(k1L)2、c1=(k2L)2Is a constant number, k1Is 0.01, k2Is 0.03 and L is the pixel value dynamic range of the image.
Further, the incremental operation on the data set in the student network in step S6 includes the following steps:
s61, selecting an indoor fog map data set to train the encoder network;
s62, inputting the indoor fog map data set serving as a training data set into a defogging network, and simultaneously inputting a clear image corresponding to each fog map in the indoor fog map data set into an encoder network;
s63, on the basis of the parameters of the defogging network in the step S62, reserving part of the indoor fog map, and adding part of the outdoor fog map as a training data set to retrain the defogging network.
Compared with the prior art, the invention has the advantages and positive effects that:
the invention provides an image defogging method based on incremental learning and feature and attention transfer, which can effectively improve the defogging and generalization capabilities of a defogging network; a double-network model is adopted on a network structure, a self-encoder is used as a teacher network, a middle-layer characteristic diagram and a characteristic attention diagram of a fog-free image are extracted to increase the constraint of a loss function and guide the learning of a defogging network (a student network), an incremental learning method idea is adopted on a training mode, the defogging network is trained by using an indoor fog diagram data set, a small sample data set including an indoor fog diagram and an outdoor fog diagram is used after the training is finished, the network is retrained, the forgetting of the defogging network to the original knowledge is reduced under the combined action of the guidance of the teacher network and the retention of a small number of data sets, and the defogging effect of the image is improved.
The invention has stronger defogging capability on indoor image data, only needs to take a small amount of image data to perform incremental learning on the network when the defogging effect of the network on an outdoor image data set needs to be improved, does not need to retrain the network by a large amount of data, saves a large amount of time, has good effect on processing two data sets, and has better performance compared with other advanced defogging methods.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a general block diagram of a network of the present invention;
fig. 2 is a schematic structural diagram of a feature enhancement module.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived from the embodiments of the present invention by a person skilled in the art without any creative effort, should be included in the protection scope of the present invention.
The invention provides a double-network defogging method combining attention, incremental learning and other methods, which comprises the following steps:
(1) and constructing a self-encoder network serving as a teacher network, reconstructing clear images, extracting intermediate layer feature maps and feature attention maps of different dimensions of the network, and training subsequent student networks. The teacher network comprises an up-sampling module and a down-sampling module.
(2) And constructing a defogging network serving as a student network for clarifying fog images, wherein the network consists of residual blocks formed by jump connection, constraining feature maps of different dimensions output by the residual blocks by using a Smooth L1 loss function, fitting a teacher network feature map and an attention map, and taking the fitted attention map as a weight to enhance corresponding features. And the dimension of the output characteristic diagram of each layer of the student network corresponds to the teacher network.
(3) The teacher network is trained using multiple sets of pairs of identical images as teacher network inputs and labels.
(4) And (4) using multiple groups of paired fog images and clear images as input and labels of the student network to optimize and train the student network.
(5) The difference between the teacher network and the student network is measured by using a Smooth L1 norm as a loss function between labels and network defogging results in the teacher network and the student network, and by using an SSIM loss function as a loss function between characteristic diagrams in the teacher network and the student network and using a Smooth L1 loss function as a loss function between a first attention diagram and a second attention diagram, wherein the two loss functions work together to train the student network.
(6) The student network fits the intermediate features of the teacher network as much as possible, and the features are enhanced with attention, so that the feature extraction capability of the student network is improved, and the defogging capability is further enhanced.
(7) The student network data set is increased, the defogging capacity of the network on other data is improved, and the generalization capacity of the network is enhanced.
In step (1), the self-encoder consists of four layers of convolution and up-sampling modules, the first layer of convolution uses 64 convolution kernels of 3x3, the step size is 2, the pad is 1, and the convolution operation is carried out, namely f1=3,c164, the convolutional layer can be represented as Conv1(3,64, 3); the second layer convolution uses 128 convolution kernels of 3x3, and performs convolution operation with step size 1 and pad 1, i.e., f2=3,c2128, the convolutional layer may be denoted as Conv2(64,128, 3); the third layer of convolution uses 256 convolution kernels of 3x3, with step size of 2 and pad of 1, and performs the convolution operation, i.e. f3=3,c3256, the convolutional layer can be represented as Conv3(128,256, 3); the fourth layer convolution uses 512 convolution kernels of 3x3, and performs convolution operation with step size of 1 and pad of 1, i.e. f4=3,c4512, the convolutional layer may be denoted as Conv4(256,512,3). In order to prevent the grid effect of the image restored by deconvolution when the characteristic diagram is too small, the method is only carried out twiceSampling, expanding channels simultaneously, facilitating network training, preventing information loss, activating output by using the ReLU after each layer of convolution, and increasing the nonlinearity of the network. The up-sampling operation corresponds to the convolution module, the four deconvolution layers are utilized to restore the original size of the image, and meanwhile, the number of channels is also restored to the original state, and the specific operation is as follows: the first layer of deconvolution adopts 256 4x4 convolution kernels, and performs one-time upsampling by using step size of 2 and pad of 1, namely f1'=4,c1256,512 channels reduced to 256, and the deconvolution layer denoted TranConv1(512,256,4), the second layer of deconvolution uses 128 1 × 1 convolution kernels with step size 1 and pad 0, without changing the feature map size, i.e., f2'=1,c'2128, the deconvolution layer is denoted TranConv2(256,128, 1); the third layer of deconvolution uses 64 4x4 convolution kernels, uses step size of 2 and pad of 1, and performs upsampling to double the feature map as in the first layer, i.e., f3'=4,c'364, the deconvolution layer is denoted TranConv3(128,64, 4); the fourth layer of deconvolution adopts 3 1x1 convolution kernels, the step size is 1, the pad is 0, and the size of the feature map is not changed, namely f4'=1,c'43, the deconvolution layer is denoted TranConv4(64,3,1)。
In the step (2), the defogging network adopts residual blocks to form a network backbone, and the 'identity mapping' reduces the loss of characteristic information in the characteristic extraction process, retains more information, is beneficial to the training of the network and prevents the problem of gradient 'explosion'. The defogging network uses four residual blocks, performs feature extraction with two convolution layers which are subjected to downsampling, in order to ensure that the feature diagram of each layer of the defogging network has the same dimension as the feature diagram of a teacher network, the residual blocks adopt two 3x3 convolution layers, the pad is 1, the step length is 1, the input dimension and the output dimension are kept unchanged, namely, each layer is in a Conv-ReLU-Conv-ReLU-Add format, no BN layer exists, the experiment shows that the BN layer can cause the color distortion phenomenon of an image, and then, a convolution with the convolution kernel of 3x3, the step length of 2 and the pad of 1 is respectively added in front of the first layer and the third layer of residual blocks to perform downsampling operation. In addition, the characteristics of each layer of the teacher network and the student network are input into a characteristic enhancement module (FE), and the characteristics are enhanced and then input into the next layer of the student network. And finally, performing up-sampling operation on the feature graph to recover a clear fog-free image, wherein the up-sampling operation is the same as that of a teacher network.
The self-encoder structure adopts an encoding-decoding structure to reconstruct images, and the self-encoder is selected as a teacher to learn the mapping from the original images to the original images through network learning. Based on the idea that the fog-free clear image serving as the input extracted feature is more representative and more suitable for recovering the fog-free image compared with the fog-free image, the method takes a self-encoder which is well trained as a teacher network, extracts the fog-free image feature map and the attention map obtained from each middle layer of the network, calculates the loss function of the corresponding feature map and the attention map obtained from a defogging network (student network), and fits the feature map extracted by the defogging network to the teacher network feature map. In addition, the method transforms the feature graph through the Sigmod function to obtain an attention map, and due to the characteristics of the Sigmod function, the value obtained by transforming the important feature pixel points is larger, namely the weight is larger, so that the neural network puts more attention on the important features. The Sigmod function maps the feature values to (0, 1) and multiplies the feature values correspondingly with the original feature graph as attention weights, so that the obtained feature values are gradually reduced and are not beneficial to network training, aiming at the problem, an identity mapping is added in the invention, the original feature graph and the processed feature graph are subjected to element addition, so that the function of feature enhancement is achieved, the feature enhancement is realized by an FE (feature enhancement) module in the figure 1, the invention selects feature graphs extracted by different convolution layers for feature enhancement, so that the features extracted by the defogging network are more comprehensively fitted to the teacher network features, and an FE structural diagram is shown in figure 2.
In step (5), the difference between the clear image output by the network and the real clear image is measured by using a Smooth L1 loss function, and the training of the network is realized by minimizing the loss function. The Smooth L1 loss function is obtained by improving on the basis of an L1 norm loss function, and the mathematical formula of the L1 norm loss can be expressed as:
Figure BDA0002987602170000081
wherein J is a label, and J is a label,
Figure BDA0002987602170000091
for the network estimation result, N is the number of samples. For the problem that the solution is unstable due to the fact that the L1 loss function has good robustness but the center point is a break point and is not Smooth, the scholars propose that the Smooth L1 loss function improves the L1 loss, and the mathematical formula can be expressed as follows:
Figure BDA0002987602170000092
wherein the content of the first and second substances,
Figure BDA0002987602170000093
in step (6), in addition to calculating the Loss1 of the fog map and the fog-free map, the intermediate output of the teacher network is used as a student network soft label, the Loss LOSS _ F between the feature maps of each layer and the Loss LOSS _ A between the attention maps are increased, smooth L1 Loss is used as a Loss function between the fog image and the estimated fog-free image and between the two network intermediate attention maps, and Structural Similarity (SSIM) is used as a Loss function between the intermediate feature maps of the two network outputs. The structural similarity loss is used for measuring the structural similarity between two images, the structural similarity is compared from three aspects of brightness, contrast and structure, the evaluation standard of SSIM is similar to the visual system of human, the sensing of local structural change is sensitive, the detail processing is more perfect, and the network performance is greatly improved due to the constraint of a multi-loss function. The SSIM mathematical expression is:
Figure BDA0002987602170000094
wherein x is student network learningIs characterized by the presence of a fog pattern in the middle,
Figure BDA0002987602170000095
intermediate fog-free map feature, mu, output for teacher's networkx
Figure BDA0002987602170000096
Are the average values of the characteristic maps respectively,
Figure BDA0002987602170000097
respectively, the variance of the feature map is,
Figure BDA0002987602170000098
is the feature map covariance. c. C1=(k1L)2、c1=(k2L)2Is a constant number, k1、k2Defaults to 0.01 and 0.03 respectively, L is the dynamic range of the pixel value of the image, and the value of the invention is 255.
In step (7), in order to enhance the network generalization capability, an incremental learning manner is adopted in the network learning part, and the structure is shown in fig. 1 (right). The network training is divided into three steps: the first step is as follows: training a self-encoder (teacher network), and selecting an indoor fog image as a data set, so that the teacher network has good indoor image reconstruction and feature extraction capability; the second step is that: the defogging network (student network) adopts an indoor fog image data set, the fog image is input into the defogging network, meanwhile, a clear image corresponding to the fog image is used as the input of the self-encoder, and the defogging network is trained to have the capability of removing indoor image haze; the third step: and on the basis of the parameters of the defogging network in the second step, reserving a small amount of indoor fog maps, and adding a small amount of outdoor fog maps as a data set to retrain the network. The incremental learning has the defects that old knowledge is forgotten, and when the network learns new knowledge, part of the existing knowledge is forgotten, but the teacher network provided by the invention not only reduces the forgetting of the student network to the existing knowledge, but also improves the effect of the network to the new knowledge.
The results of the experiments were compared on ITS data sets, and objective evaluation index pairs are shown in table 1.
TABLE 1 Objective evaluation index comparison
Figure 1
The results were compared on the OTS data sets, and the objective evaluation index pairs are shown in table 2.
TABLE 2 Objective evaluation index
Figure BDA0002987602170000102
As is apparent from tables 1 and 2, the technical scheme of the invention can effectively improve the defogging and generalization capabilities of the defogging network; the invention adopts a double-network model on a network structure, a self-encoder is used as a teacher network, a middle layer characteristic diagram and a characteristic attention diagram of a fog-free image are extracted to increase the constraint of a loss function and guide the learning of a defogging network (student network), an incremental learning method idea is adopted on a training mode, firstly, an indoor fog diagram data set is used for training the defogging network, a small sample data set is used after the training is finished, the network is retrained, the forgetting of the defogging network on the original knowledge is reduced under the coaction of the teacher network and the reservation of a small number of data sets, and the defogging effect of the image is improved.
The invention has stronger defogging capability on indoor image data, only needs to take a small amount of image data to perform incremental learning on the network when the defogging effect of the network on an outdoor image data set needs to be improved, does not need to retrain the network by a large amount of data, saves a large amount of time, has good effect on processing two data sets, and has better performance compared with other advanced defogging methods.

Claims (7)

1. An image defogging method based on increment learning and feature and attention transfer is characterized in that: the method comprises the following steps:
s1, constructing a self-encoder network serving as a teacher network, and extracting first intermediate layer feature diagrams and first feature attention diagrams of different dimensions in the self-encoder network for subsequent training of the student network;
s2, constructing a defogging network serving as a student network, wherein the defogging network consists of a residual block and two layers of convolutions, a Smooth L1 loss function is used for constraining the residual block to output a second intermediate layer characteristic diagram with different dimensionalities, a second characteristic attention diagram is used for fitting a first intermediate layer characteristic diagram and a first characteristic attention diagram extracted from a coder network, and a third characteristic attention diagram obtained after fitting is used as weight to carry out enhancement operation on corresponding characteristics;
s3, using multiple groups of same images in pairs as the input and the label of the teacher network to train the teacher network;
s4, using multiple groups of paired fog images and clear images as the input and label of the student network to carry out optimization training on the student network;
s5, using a Smooth L1 loss function as a loss function between labels and defogging results in the teacher network and the student network, simultaneously using an SSIM loss function as a loss function between a first middle layer characteristic diagram and a second middle layer characteristic diagram, using a Smooth L1 loss function as a loss function between a first attention diagram and a second attention diagram, and training the student network under the combined action of the SSIM loss function and the Smooth L1 loss function;
and S6, performing incremental operation on the data set in the student network, and improving the defogging capacity of the defogging network on other data.
2. The image defogging method based on incremental learning and feature and attention transfer as claimed in claim 1, wherein: the self-encoder network in the step S1 is composed of a convolution module and an upsampling module; the convolution module comprises four layers of convolution operation; the first layer of convolution uses 64 convolution kernels of 3x3, and performs convolution operation with step size of 2 and pad of 1, i.e. f1=3,c164, the first layer convolution can be represented as Conv1(3,64, 3); the second layer convolution uses 128 convolution kernels of 3x3, and performs convolution operation with step size 1 and pad 1, i.e., f2=3,c2128, the second layer convolution may be denoted as Conv2(64,128, 3); the third layer of convolution uses 256 convolution kernels of 3x3, and performs convolution operation with step size of 2 and pad of 1, i.e. f3=3,c3256, the third layer convolution can be expressed as Conv3(128,256, 3); the fourth layer convolution uses 512 convolution kernels of 3x3, and performs convolution operation with step size of 1 and pad of 1, i.e. f4=3,c4The fourth layer convolution may be denoted as Conv 5124(256,512,3);
The up-sampling module corresponds to the convolution module and comprises four layers of deconvolution operations; the first layer of deconvolution uses 256 4x4 convolution kernels, upsampled with step size 2 and pad 1, i.e., f1'=4,c′1The first layer deconvolution is denoted as TranConv 2561(512,256, 4); the second layer of deconvolution uses 128 convolution kernels of 1x1, upsampled with step size 1 and pad of 0, i.e., f'2=1,c'2128, the second layer deconvolution is denoted as TranConv2(256,128, 1); the third layer of deconvolution uses 64 4x4 convolution kernels, upsampled with step size of 2 and pad of 1, i.e., f'3=4,c'364, the third layer deconvolution is denoted as TranConv3(128,64, 4); the fourth layer of deconvolution uses 3 convolution kernels of 1x1, upsampled with step size of 1 and pad of 0, i.e., f'4=1,c'4The fourth layer deconvolution is denoted as TranConv ═ 34(64,3,1)。
3. The image defogging method based on incremental learning and feature and attention transfer as claimed in claim 2, wherein: the residual block in step S2 adopts two 3x3 convolution layers, where pad is 1 and step size is 1, and input dimension and output dimension are kept unchanged, that is, each layer of residual block is in Conv-ReLU-Add format, and in addition, a layer of convolution kernel of 3x3, step size is 2, and convolution with pad of 1 is added before the first layer and the third layer of residual block to perform downsampling operation.
4. The image defogging method based on incremental learning and feature and attention transfer as claimed in claim 3, wherein: the third feature attention in step S2 is input to the student network after the feature enhancement operation is performed on the feature in the feature enhancement module.
5. The image defogging method based on incremental learning and feature and attention transfer as claimed in claim 4, wherein: in step S5, a Smooth L1 loss is used as a loss function between the output and the tag and between the first feature attention map and the second feature attention map, the Smooth L1 loss function is obtained by improving on the basis of an L1 norm loss function, and a mathematical formula of the L1 norm loss function is as follows:
Figure FDA0002987602160000031
wherein J is a label, and J is a label,
Figure FDA0002987602160000032
n is the number of samples as the network estimation result;
the mathematical formula for the Smooth L1 loss function is:
Figure FDA0002987602160000033
wherein the content of the first and second substances,
Figure FDA0002987602160000034
6. the image defogging method based on incremental learning and feature and attention transfer as claimed in claim 5, wherein: the mathematical formula of the SSIM loss function in step S5 is:
Figure FDA0002987602160000035
wherein x is the intermediate characteristic of the fog image of the student network learning,
Figure FDA0002987602160000036
intermediate features, mu, of fog-free images output for teacher's networkx
Figure FDA0002987602160000037
Respectively are the average values of the characteristic diagram of the second intermediate layer and the characteristic diagram of the first intermediate layer,
Figure FDA0002987602160000038
the variances of the second intermediate layer characteristic diagram and the first intermediate layer characteristic diagram are respectively,
Figure FDA0002987602160000039
the covariance of the second interlayer feature map and the first interlayer feature map; c. C1=(k1L)2、c1=(k2L)2Is a constant number, k1Is 0.01, k2Is 0.03 and L is the pixel value dynamic range of the image.
7. The image defogging method based on incremental learning and feature and attention transfer as claimed in claim 6, wherein: the incremental operation on the data set in the student network in the step S6 includes the following steps:
s61, selecting an indoor fog map data set to train the encoder network;
s62, inputting the indoor fog map data set serving as a training data set into a defogging network, and simultaneously inputting a clear image corresponding to each fog map in the indoor fog map data set into an encoder network;
s63, on the basis of the parameters of the defogging network in the step S62, reserving part of the indoor fog map, and adding part of the outdoor fog map as a training data set to retrain the defogging network.
CN202110304663.2A 2021-03-23 2021-03-23 Image defogging method based on incremental learning and feature and attention transfer Active CN113066025B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110304663.2A CN113066025B (en) 2021-03-23 2021-03-23 Image defogging method based on incremental learning and feature and attention transfer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110304663.2A CN113066025B (en) 2021-03-23 2021-03-23 Image defogging method based on incremental learning and feature and attention transfer

Publications (2)

Publication Number Publication Date
CN113066025A true CN113066025A (en) 2021-07-02
CN113066025B CN113066025B (en) 2022-11-18

Family

ID=76562797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110304663.2A Active CN113066025B (en) 2021-03-23 2021-03-23 Image defogging method based on incremental learning and feature and attention transfer

Country Status (1)

Country Link
CN (1) CN113066025B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592830A (en) * 2021-08-04 2021-11-02 航天信息股份有限公司 Image defect detection method and device and storage medium
CN113592742A (en) * 2021-08-09 2021-11-02 天津大学 Method for removing image moire
CN114785890A (en) * 2021-12-31 2022-07-22 北京泰迪熊移动科技有限公司 Crank call identification method and device
CN113592830B (en) * 2021-08-04 2024-05-03 航天信息股份有限公司 Image defect detection method, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9177363B1 (en) * 2014-09-02 2015-11-03 National Taipei University Of Technology Method and image processing apparatus for image visibility restoration
CN111598793A (en) * 2020-04-24 2020-08-28 云南电网有限责任公司电力科学研究院 Method and system for defogging image of power transmission line and storage medium
CN111681178A (en) * 2020-05-22 2020-09-18 厦门大学 Knowledge distillation-based image defogging method
CN112184577A (en) * 2020-09-17 2021-01-05 西安理工大学 Single image defogging method based on multi-scale self-attention generation countermeasure network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9177363B1 (en) * 2014-09-02 2015-11-03 National Taipei University Of Technology Method and image processing apparatus for image visibility restoration
CN111598793A (en) * 2020-04-24 2020-08-28 云南电网有限责任公司电力科学研究院 Method and system for defogging image of power transmission line and storage medium
CN111681178A (en) * 2020-05-22 2020-09-18 厦门大学 Knowledge distillation-based image defogging method
CN112184577A (en) * 2020-09-17 2021-01-05 西安理工大学 Single image defogging method based on multi-scale self-attention generation countermeasure network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HAIYAN WU ET AL.: "Knowledge Transfer Dehazing Network for NonHomogeneous Dehazing", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW)》 *
KEPING WANG ET AL.: "Uneven Image Dehazing by Heterogeneous Twin Network", 《IEEE ACCESS》 *
MING HONG ET AL.: "Distilling Image Dehazing With Heterogeneous Task Imitation", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
赵银湖: "基于深度学习的单幅图像去雾算法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592830A (en) * 2021-08-04 2021-11-02 航天信息股份有限公司 Image defect detection method and device and storage medium
CN113592830B (en) * 2021-08-04 2024-05-03 航天信息股份有限公司 Image defect detection method, device and storage medium
CN113592742A (en) * 2021-08-09 2021-11-02 天津大学 Method for removing image moire
CN114785890A (en) * 2021-12-31 2022-07-22 北京泰迪熊移动科技有限公司 Crank call identification method and device

Also Published As

Publication number Publication date
CN113066025B (en) 2022-11-18

Similar Documents

Publication Publication Date Title
CN109299274B (en) Natural scene text detection method based on full convolution neural network
CN108492271B (en) Automatic image enhancement system and method fusing multi-scale information
CN108921799B (en) Remote sensing image thin cloud removing method based on multi-scale collaborative learning convolutional neural network
Wang et al. Dehazing for images with large sky region
CN110738697A (en) Monocular depth estimation method based on deep learning
CN111915530B (en) End-to-end-based haze concentration self-adaptive neural network image defogging method
CN110020989B (en) Depth image super-resolution reconstruction method based on deep learning
CN112184577B (en) Single image defogging method based on multiscale self-attention generation countermeasure network
CN109035172B (en) Non-local mean ultrasonic image denoising method based on deep learning
CN109410144B (en) End-to-end image defogging processing method based on deep learning
CN113066025B (en) Image defogging method based on incremental learning and feature and attention transfer
CN114048822A (en) Attention mechanism feature fusion segmentation method for image
CN111931857B (en) MSCFF-based low-illumination target detection method
CN114936605A (en) Knowledge distillation-based neural network training method, device and storage medium
CN111429392A (en) Multi-focus image fusion method based on multi-scale transformation and convolution sparse representation
CN111402138A (en) Image super-resolution reconstruction method of supervised convolutional neural network based on multi-scale feature extraction fusion
WO2023212997A1 (en) Knowledge distillation based neural network training method, device, and storage medium
CN110738660A (en) Spine CT image segmentation method and device based on improved U-net
CN116311254A (en) Image target detection method, system and equipment under severe weather condition
Guo et al. Multifeature extracting CNN with concatenation for image denoising
CN114418987A (en) Retinal vessel segmentation method and system based on multi-stage feature fusion
CN112686830B (en) Super-resolution method of single depth map based on image decomposition
CN116452469B (en) Image defogging processing method and device based on deep learning
CN116128768B (en) Unsupervised image low-illumination enhancement method with denoising module
CN111612803B (en) Vehicle image semantic segmentation method based on image definition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant