CN116862784A

CN116862784A - Single image defogging method based on multi-teacher knowledge distillation

Info

Publication number: CN116862784A
Application number: CN202310681883.6A
Authority: CN
Inventors: 兰云伟; 崔智高; 苏延召; 马铮; 蔡艳平; 王涛; 曹继平
Original assignee: Rocket Force University of Engineering of PLA
Current assignee: Rocket Force University of Engineering of PLA
Priority date: 2023-06-09
Filing date: 2023-06-09
Publication date: 2023-10-10
Anticipated expiration: 2043-06-09
Also published as: CN116862784B

Abstract

The invention discloses a single image defogging method based on multi-teacher knowledge distillation, which comprises the following steps: 1. acquiring a training set image; 2. establishing a student network model; 3. extracting features of the foggy training images; 4. establishing a total loss function; 5. training the student network model by the foggy training image; 6. defogging the single image by using the trained student network model. According to the invention, the EPDN teacher network model and the PSD teacher network model are used for guiding and training the student network model, so that the feature extraction capability of the student network is effectively improved, the student network model is used for extracting multi-scale information of defogging images through four-scale encoding and decoding, global and local features of defogging images are effectively fused, and the defogging effect of the images is further improved.

Description

Single image defogging method based on multi-teacher knowledge distillation

Technical Field

The invention belongs to the technical field of image defogging processing, and particularly relates to a single image defogging method based on multi-teacher knowledge distillation.

Background

At present, a teacher model in an image defogging method mainly comprises a defogging method based on prior information and a defogging method based on deep learning. The image defogging method based on prior information has advantages in the aspects of recovering the visibility, contrast and texture structure of the image, and the image defogging method based on deep learning has better effects in the aspects of improving the authenticity and color fidelity of the image. However, at present, knowledge learned by a single teacher model is generally transferred to a student model, so that the student model has similar performance to the teacher model, but the trained student model is often limited by the performance of the teacher model due to the fact that the single teacher model is adopted to carry out unidirectional knowledge transfer on a student network.

Therefore, a single image defogging method based on multi-teacher knowledge distillation, which is simple in structure and reasonable in design, is lacking at present, a student network model is trained through an EPDN teacher network model and a PSD teacher network model, the feature extraction capability of the student network is effectively improved, the student network model is used for extracting multi-scale information of defogging images through four-scale encoding and decoding, global and local features of defogging images are effectively fused, and then the defogging effect of the images is improved.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a single image defogging method based on multi-teacher knowledge distillation, which has simple steps and reasonable design, and guides and trains a student network model through an EPDN teacher network model and a PSD teacher network model, so that the characteristic extraction capability of the student network is effectively improved, the student network model realizes the extraction of multi-scale information of defogging images through four-scale encoding and decoding, effectively fuses global and local characteristics of defogging images, and further improves the defogging effect of the images.

In order to solve the technical problems, the invention adopts the following technical scheme: a single image defogging method based on multi-teacher knowledge distillation, which is characterized by comprising the following steps:

Step one, acquiring a training set image:

selecting an indoor training set from a foggy day image database RESIDE; the indoor training set comprises foggy training images and foggy training images corresponding to the foggy training images, wherein the number of the foggy training images and the number of the foggy training images are the same;

step two, establishing a student network model:

the method for establishing the student network model comprises the following specific processes:

step 201, establishing an encoder model of a student network by adopting a computer; the encoder model of the student network comprises a first scale network model, a second scale network model, a third scale network model and a fourth scale network model, wherein the first scale network model comprises a first convolution layer and two RDB modules based on PA, and the second scale network model comprises a second convolution layer, two RDB modules based on PA and a feature fusion module; the third scale network model comprises a third convolution layer, two RDB modules based on PA and a feature fusion module; the fourth scale network model comprises a fourth convolution layer, two RDB modules based on PA and a feature fusion module;

step 202, adopting a computer to establish a decoder model of the student network; the decoder model of the student network comprises a first decoding network model, a second decoding network model, a third decoding network model, a fourth decoding network model and a fifth convolution layer, wherein the first decoding network model comprises two RDB modules based on PA, and the second decoding network model comprises a first transfer convolution layer, two RDB modules based on PA and a feature fusion module; the third decoding network model comprises a second transpose convolution layer, two RDB modules based on PA and a feature fusion module; the fourth decoding network model comprises a third transpose convolution layer, two RDB modules based on PA and a feature fusion module;

Step three, extracting features of the foggy training images:

step 301, extracting features of the foggy training image I through a first scale network model by using a computer to obtain a first scale feature map F _e1 ；

Step 302, adopting a computer to make the first scale feature map F _e1 Feature extraction is carried out through a second scale network model, and a second scale feature map F is obtained _e2 ；

Step 303, adopting a computer to make the second scale feature map F _e2 Feature extraction is carried out through a third-scale network model, and a third-scale feature map F is obtained _e3 ；

Step 304, computer-integrating the third scale feature map F _e3 Feature extraction is carried out through a fourth-scale network model, and a fourth-scale feature map F is obtained _e4 ；

Step 305, using a computer to map the fourth scale feature map F _e4 Extracting features through the first decoding network model to obtain a first decoding feature map F _d1 ；

Step 306, using a computer to decode the first decoding feature map F _d1 Feature extraction is carried out through a second decoding network model to obtain a second decoding feature map F _d2 ；

Step 307, using a computer to decode the feature map F _d2 Extracting features through a third decoding network model to obtain a third decoding feature map F _d3 ；

Step 308, using a computer to decode the third decoding feature map F _d3 Feature extraction is carried out through a fourth decoding network model to obtain a fourth decoding feature map F _d4 The method comprises the steps of carrying out a first treatment on the surface of the Computer-implemented fourth decoding feature map F _d4 Feature extraction is carried out through fifth convolution, and an output defogging image out is obtained;

step 309, processing the foggy training image I by using the EPDN teacher network model with a computer to obtain an defogging image out output by the EPDN teacher network _EP And EP is addedFeature map output by global sub-generator in DN teacher network model and marked as EPDN teacher network intermediate output feature map EP ₁ ；

Processing the foggy training image I by using a computer through a teacher PSD network model to obtain a PSD teacher network output defogging image out _PS The feature map output by the main network in the PSD network model of the teacher is recorded as a PSD teacher network middle output feature map PS ₂ ；

Step four, establishing a total loss function:

step 401, adopting a computer according toObtaining a perception loss function L _per The method comprises the steps of carrying out a first treatment on the surface of the Where i is a positive integer, n=5, and Φ ⁱ (gt) a characteristic diagram representing the output of the foggy training image gt corresponding to the foggy training image I through the Relu i_1 layer in the VGG19 network model, phi ⁱ (out) a characteristic diagram of the output defogging image out of the student network model, which is output by a Relu i_1 layer in the VGG19 network model, wherein i is more than or equal to 1 and less than or equal to 5; c (C) _i 、H _i And W is _i The channel number, the length and the width of the feature map output by the Relu i_1 layer are respectively represented; (phi) ⁱ (gt),Φ ⁱ (out)) _L1 Representing Manhattan distance between two feature graphs output by a Relu i_1 layer in a VGG19 network model;

step 402, using computer according to L _dist ＝(out,out _EP ) _L1 +(out,out _PS ) _L1 +0.25(EP ₁ ,F _d2 ) _L1 +0.5(PS ₂ ,F _d3 ) _L1 Obtaining a distillation loss function L _diss The method comprises the steps of carrying out a first treatment on the surface of the Wherein (out ) _EP ) _L1 Output defogging image out representing student network model and EPDN teacher network output defogging image out _EP Manhattan distance between (out ) _PS ) _L1 Output defogging image out representing student network model and PSD teacher network output defogging image out _PS Manhattan distance between (EP) ₁ ,F _d2 ) _L1 Representing EPDN teacher network intermediate output feature map EP ₁ And a student network modelSecond decoding feature map F _d2 Manhattan distance between (PS) ₂ ,F _d3 ) _L1 Representing PSD teacher network intermediate output feature map PS ₂ And third decoding feature map F of student network model _d3 Manhattan distance between;

step 403, using computer according to L _loss ＝0.1L _per +L _dist Obtaining the total loss function L _loss ；

Training the student network model by the foggy training image:

step 501, adopting Adam optimization algorithm by computer and utilizing total loss function L _loss Performing iterative optimization on the student network model until the training set is completely trained, and completing one-time iterative training;

Step 502, repeating the iterative training in step 501 until the iterative training preset times are met, and obtaining a trained student network model;

step six, defogging the single image by using the trained student network model:

and inputting any one foggy image into a trained student network model by adopting a computer to perform defogging treatment, so as to obtain a foggy image.

The single image defogging method based on multi-teacher knowledge distillation is characterized by comprising the following steps of: in step 201, the number of convolution kernels in the first convolution layer is 32, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1;

the number of convolution kernels in the second convolution layer is 64, the size of the convolution kernels is 3×3, the sliding step size is 2, and the padding is 1;

the PA-based RDB module in step 201 includes a first conv+relu layer, a Conv1 convolution layer, an RDB module, a Conv2 convolution layer, and a Sigmoid activation function layer; the number of convolution kernels in the first Conv+ReLU layer is 32, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1; the number of convolution kernels in the Conv1 convolution layer is 32, the size of the convolution kernels is 1 multiplied by 1, the sliding step length is 1, and the padding is 0; the number of convolution kernels in the Conv2 convolution layer is 32, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1;

The number of convolution kernels in the third convolution layer is 128, the size of the convolution kernels is 3×3, the sliding step length is 2, and the padding is 1;

the number of convolution kernels in the fourth convolution layer is 256, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 2, and the padding is 1;

the feature fusion module in step 201 comprises a first Conv+InstanceNorm normalization+ReLU activation function layer and a second Conv+InstanceNorm normalization+ReLU activation function layer;

in step 202, the number of convolution kernels in the first transpose convolution layer is 128, the size of the convolution kernels is 3×3, the sliding step size is 2, the padding is 1, and the out_padding is 1;

the number of convolution kernels in the second transpose convolution layer is 64, the size of the convolution kernels is 3×3, the sliding step size is 2, the padding is 1, and the out_padding is 1;

the number of convolution kernels in the third transpose convolution layer is 32, the size of the convolution kernels is 3×3, the sliding step size is 2, the padding is 1, and the out_padding is 1;

the number of convolution kernels in the fifth convolution layer is 3, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1.

The single image defogging method based on multi-teacher knowledge distillation is characterized by comprising the following steps of: in step 301, a computer is used to perform feature extraction on the foggy training image I through a first scale network model to obtain a first scale feature map F _e1 The specific process is as follows:

step 3011, performing feature extraction on the foggy training image I through a first convolution layer by adopting a computer to obtain an input feature map F _in ；

Step 3012, the computer inputs the feature map F _in Inputting a RDB module based on PA for feature extraction to obtain an intermediate output feature map F _out ；

Step 3013, the computer outputs the intermediate output feature map F according to the method described in step 3012 _out Inputting another RDB module based on PA to perform feature extraction to obtain a first scale feature map F _e1 。

One of the above is based on multiple teachersThe defogging method for the single image of knowledge distillation is characterized by comprising the following steps of: computer-implemented first scale feature map F in step 302 _e1 Feature extraction is carried out through a second scale network model to obtain a second scale feature map F _e2 The specific process is as follows:

step 3021, using a computer to map the first scale feature map F _e1 Extracting features through a second convolution layer to obtain a second input feature map;

step 3022, inputting the second input feature map into a PA-based RDB module in the second scale network model by the computer to perform feature extraction, so as to obtain a second scale first coding feature map;

step 3023, inputting the second-scale first coding feature map into another PA-based RDB module in the second-scale network model by the computer for feature extraction, so as to obtain a second-scale second coding feature map;

Step 3024, the computer maps the first scale feature map F _e1 Performing 0.5 times downsampling to obtain a first downsampling feature map;

step 3025, calling a splicing cat function module by a computer to splice the first downsampling feature map and the second-scale second coding feature map to obtain a first spliced feature map;

step 3026, inputting the first spliced feature map into a feature fusion module in the second scale network model by using a computer to obtain a second scale feature map F _e2 ；

Computer-implemented step 303 of mapping the second scale feature map F _e2 Feature extraction is carried out through a third-scale network model to obtain a third-scale feature map F _e3 The specific process is as follows:

step 3031, using a computer to map the second scale feature map F _e2 Extracting features through a third convolution layer to obtain a third input feature map;

step 3032, the computer inputs the third input feature map into a PA-based RDB module in the third-scale network model to perform feature extraction to obtain a third-scale first coding feature map;

step 3033, the computer inputs the third-scale first decoding feature map into another RDB module based on PA in the third-scale network model to perform feature extraction, so as to obtain a third-scale second coding feature map;

Step 3034, the computer maps the second scale feature map F _e2 Performing 0.5 times downsampling to obtain a second downsampling feature map;

the computer maps the first scale characteristic map F _e1 Performing 0.25 times downsampling to obtain a third downsampling feature map;

step 3035, a computer is adopted to call a splicing cat function module to splice the second downsampling feature map, the third downsampling feature map and the third-scale second coding feature map to obtain a second spliced feature map;

step 3036, inputting the second spliced feature map into a feature fusion module in the third-scale network model by adopting a computer to obtain a third-scale feature map F _e3 ；

Computer-implemented third scale feature map F in step 304 _e3 Feature extraction is carried out through a fourth-scale network model to obtain a fourth-scale feature map F _e4 The specific process is as follows:

step 3041, computer-implemented third scale feature map F _e3 Extracting features through a fourth convolution layer to obtain a fourth input feature map;

step 3042, inputting the fourth input feature map into a PA-based RDB module in a fourth-scale network model by a computer to perform feature extraction, so as to obtain a fourth-scale first coding feature map;

step 3043, inputting the fourth-scale first coding feature map into another RDB module based on PA in the fourth-scale network model by a computer for feature extraction to obtain a fourth-scale second coding feature map;

Step 3044 computer maps the third scale feature map F _e3 Performing 0.5 times downsampling to obtain a fourth downsampling feature map;

the computer maps the second scale characteristic map F _e2 Performing 0.25 times downsampling to obtain a fifth downsampling characteristic map;

the computer maps the first scale characteristic map F _e1 Performing 0.125 times downsampling to obtain a sixth downsampled feature map;

step 3045, calling a splicing cat function module by a computer to splice the fourth downsampling feature map, the fifth downsampling feature map, the sixth downsampling feature map and the fourth-scale second coding feature map to obtain a third spliced feature map;

step 3046, inputting the third spliced feature map into a feature fusion module in the fourth-scale network model by using a computer to obtain a fourth-scale feature map F _e4 。

The single image defogging method based on multi-teacher knowledge distillation is characterized by comprising the following steps of: the computer will input a feature map F in step 3012 _in Inputting a PA-based RDB module to perform feature extraction to obtain an intermediate feature map F _out The specific process is as follows:

step A, the computer inputs the characteristic diagram F _in Feature extraction is carried out through a first Conv+ReLU layer to obtain a feature map F _pre ；

Step B, the computer makes the feature map F _pre The Conv1 convolution layer and the RDB module are input to perform feature extraction to obtain a feature map F _RDB At the same time, feature map F _pre The input Conv2 convolution layer carries out convolution processing and is normalized by a Sigmoid activation function to obtain a space weight diagram F _s ；

Step C, the computer is according toObtaining a characteristic diagram F _mid The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Hadamard product operation between matrices representing feature maps,/->Representing addition operations between feature map matrices;

step D, the computer is according toObtaining an intermediate feature map F _out ；

The single image defogging method based on multi-teacher knowledge distillation is characterized by comprising the following steps of: in step 3026, step 3036, and step 3046, the first post-stitching feature map, the second post-stitching feature map, and the third post-stitching feature map are recorded as post-stitching feature maps, and the second scale feature map F _e2 Third scale feature map F _e3 And fourth scale feature map F _e4 And (3) respectively recording the characteristic images as fused scale characteristic images, and inputting the spliced characteristic images into a characteristic fusion module by adopting a computer to obtain the scale characteristic images, wherein the specific process is as follows:

a1, performing feature processing on the spliced feature map through a first Conv+InstanceNorm normalization+ReLU activation function layer by adopting a computer to obtain a fusion coding feature map;

and A2, performing feature processing on the fusion coding feature map through a second Conv+InstanceNorm normalization+ReLU activation function layer by adopting a computer to obtain a fused scale feature map.

The single image defogging method based on multi-teacher knowledge distillation is characterized by comprising the following steps of: computer-implemented fourth scale feature map F in step 305 _e4 Feature extraction is carried out through a first decoding network model to obtain a first decoding feature map F _d1 The specific process is as follows:

step 3051, computer is used to map the fourth scale feature map F _e4 Performing feature extraction through a PA-based RDB module in the first decoding network model to obtain a first pre-decoding feature map;

step 3052, inputting the first pre-decoding feature map into another PA-based RDB module in the first decoding network model by the computer to perform feature extraction, thereby obtaining a first decoding feature map F _d1 ；

Step 306, the specific process is as follows:

step 3061, using a computer to decode the first feature map F _d1 Performing feature extraction through a first inversion convolution layer to obtain a first decoded first up-sampling feature map;

step 3062, performing feature extraction on the first decoded first upsampled feature map through two PA-based RDB modules in the second decoding network model by using a computer to obtain a first intermediate feature map;

step 3063, using a computer to decode the first feature map F _d1 2 times of up-sampling processing is carried out to obtain a first decoding second up-sampling feature map;

Step 3064, a computer is adopted to call a splicing cat function module to splice the first intermediate feature map and the first decoding second up-sampling feature map, so as to obtain a first decoding splicing feature map;

step 3065, inputting the first decoding spliced feature map into a feature fusion module in the second decoding network model by using a computer to obtain a second decoding feature map F _d2 ；

Step 307, the specific process is as follows:

step 3071, computer-readable recording medium storing the second decoding profile F _d2 Performing feature extraction through a second transposition convolution layer to obtain a second decoded first upsampling feature map;

step 3072, performing feature extraction on the second decoded first upsampled feature map by using a computer through two PA-based RDB modules in the third decoding network model to obtain a second intermediate feature map;

step 3073, using a computer to decode the first decoded feature map F _d1 Obtaining a second up-sampling characteristic diagram of a second decoding through 4 times up-sampling;

decoding the second decoding feature map F _d2 Obtaining a second decoding third up-sampling feature map through 2 times up-sampling;

step 3074, calling a splicing cat function module by a computer to splice the second intermediate feature map, the second decoding second up-sampling feature map and the second decoding third up-sampling feature map to obtain a second decoding splicing feature map;

Step 3075, inputting the second decoding spliced feature map into a feature fusion module in the third decoding network model by using a computer to obtain a third decoding feature map F _d3 ；

Step 308, the specific process is as follows:

step 3081, using a computer to decode the third feature map F _d3 Performing feature extraction through a third transposition convolution layer to obtain a third decoded first upsampling feature map;

step 3072, performing feature extraction on the third decoded first upsampled feature map by using a computer through two PA-based RDB modules in the fourth decoding network model to obtain a third intermediate feature map;

step 3073, using a computer to decode the first decoded feature map F _d1 Obtaining a third decoding second up-sampling feature map through 8 times up-sampling;

decoding the second decoding feature map F _d2 Obtaining a third up-sampling characteristic diagram of a third decoding through 4 times up-sampling;

decoding the third decoding feature map F _d3 Obtaining a third decoding fourth up-sampling feature map through 2 times up-sampling;

step 3074, calling a splicing cat function module by a computer to splice the third intermediate feature map, the third decoding second up-sampling feature map, the third decoding third up-sampling feature map and the third decoding fourth up-sampling feature map to obtain a third decoding splicing feature map;

Step 3075, inputting the third decoding spliced feature map into a feature fusion module in the third decoding network model by using a computer to obtain a fourth decoding feature map F _d4 。

Compared with the prior art, the invention has the following advantages:

1. the method has simple steps and reasonable design, and firstly, the training set image is acquired; secondly, establishing a student network model, extracting features of a foggy training image, establishing a total loss function, training the student network model by the foggy training image, defogging a single image by using the trained student network model, and improving the defogging effect of the image.

2. The student network model adopts the feature attention residual error dense block to carry out multi-scale feature extraction to generate the end-to-end feature image, thereby utilizing the advantages of the neural network and having stronger generalization capability.

3. The encoder model in the student network model comprises a first scale network model, a second scale network model, a third scale network model and a fourth scale network model, the decoder model comprises a first decoding network model, a second decoding network model, a third decoding network model and a fourth decoding network model, the four-scale downsampling characteristic extraction is realized through the encoder model, the four-scale upsampling characteristic extraction is realized through the decoder model, the extraction of multi-scale information of a defogging image is realized, the global and local characteristics of the defogging image are effectively fused, and the defogging effect of the image is further improved.

4. According to the invention, the EPDN teacher network model and the PSD teacher network model are adopted to realize knowledge migration from the teacher network to the student network in a multi-teacher knowledge distillation mode, so that the student network can combine the complementary advantages of an image defogging method based on prior information and an image defogging method based on deep learning.

In summary, the method has simple steps and reasonable design, the EPDN teacher network model and the PSD teacher network model are used for guiding and training the student network model, the feature extraction capability of the student network is effectively improved, the student network model is used for extracting multi-scale information of the defogging image through four-scale encoding and decoding, the global and local features of the defogging image are effectively fused, and the defogging effect of the image is further improved.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a block flow diagram of the method of the present invention.

Fig. 2 is a schematic diagram of the structure of the student network model of the present invention.

Fig. 3 is a schematic structural diagram of a residual error density block according to a feature of the present invention.

Fig. 4 is a schematic structural diagram of a feature fusion module according to the present invention.

Detailed Description

As shown in fig. 1 to 4, the single image defogging method based on multi-teacher knowledge distillation of the present invention comprises the following steps:

step one, acquiring a training set image:

step two, establishing a student network model:

step three, extracting features of the foggy training images:

step 301, adopting a computer to train the foggy training image IFeature extraction is carried out through a first scale network model to obtain a first scale feature map F _e1 ；

step 309, processing the foggy training image I by using the EPDN teacher network model with a computer to obtain an defogging image out output by the EPDN teacher network _EP And the feature map output by the global sub-generator in the EPDN teacher network model is recorded as the intermediate output feature map EP of the EPDN teacher network ₁ ；

Processing the foggy training image I by using a computer and utilizing a teacher PSD network model to obtain a PSD teacher network output defogging imageout _PS The feature map output by the main network in the PSD network model of the teacher is recorded as a PSD teacher network middle output feature map PS ₂ ；

Step four, establishing a total loss function:

step 402, using computer according to L _dist ＝(out,out _EP ) _L1 +(out,out _PS ) _L1 +0.25(EP ₁ ,F _d2 ) _L1 +0.5(PS ₂ ,F _d3 ) _L1 Obtaining a distillation loss function L _diss The method comprises the steps of carrying out a first treatment on the surface of the Wherein (out ) _EP ) _L1 Output defogging image out representing student network model and EPDN teacher network output defogging image out _EP Manhattan distance between (out ) _PS ) _L1 Output defogging image out representing student network model and PSD teacher network output defogging image out _PS Manhattan distance between (EP) ₁ ,F _d2 ) _L1 Representing EPDN teacher network intermediate output feature map EP ₁ And a second decoding feature map F of the student network model _d2 Manhattan distance between (PS) ₂ ,F _d3 ) _L1 Representing PSD teacher network intermediate output feature map PS ₂ And third decoding feature map F of student network model _d3 Manhattan distance between;

Training the student network model by the foggy training image:

In this embodiment, in step 201, the number of convolution kernels in the first convolution layer is 32, the size of the convolution kernel is 3×3, the sliding step size is 1, and the padding is 1;

In this embodiment, in step 301, a computer is used to perform feature extraction on the foggy training image I through a first scale network model to obtain a first scale feature map F _e1 The specific process is as follows:

In this embodiment, in step 302, the first scale feature map F is computed _e1 Feature extraction is carried out through a second scale network model to obtain a second scale feature map F _e2 The specific process is as follows:

step 3046, inputting the third spliced feature map into a feature fusion module in the fourth-scale network model by using a computer to obtain a fourth-scale feature map Sign F _e4 。

In this embodiment, the computer in step 3012 will input the feature map F _in Inputting a PA-based RDB module to perform feature extraction to obtain an intermediate feature map F _out The specific process is as follows:

In this embodiment, in step 3026, step 3036 and step 3046, the first post-stitching feature map, the second post-stitching feature map and the third post-stitching feature map are recorded as post-stitching feature maps, and the second scale feature map F _e2 Third scale feature map F _e3 And fourth scale feature map F _e4 And (3) respectively recording the characteristic images as fused scale characteristic images, and inputting the spliced characteristic images into a characteristic fusion module by adopting a computer to obtain the scale characteristic images, wherein the specific process is as follows:

In this embodiment, in step 305, a fourth scale feature map F is computed _e4 Feature extraction is carried out through a first decoding network model to obtain a first decoding feature map F _d1 The specific process is as follows:

Step 306, the specific process is as follows:

step 3065, inputting the first decoding spliced feature map into a feature fusion module in the second decoding network model by adopting a computer to obtain a second decodingFeature map F _d2 ；

Step 307, the specific process is as follows:

Step 308, the specific process is as follows:

In this embodiment, it should be noted that the structures of the feature fusion modules in the second scale network model, the third scale network model and the fourth scale network model are the same and only the number of convolution kernels is different.

In this embodiment, it should be noted that the structures of the feature fusion modules in the second decoding network model, the third decoding network model, and the fourth decoding network model are the same and only the number of convolution kernels is different.

In this embodiment, the convolution layer in the first conv+instancenorm normalization+relu activation function layer in the second scale network model is a Conv3 convolution layer, the number of convolution kernels in the Conv3 convolution layer is 96, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1;

the convolution layer in the second Conv+InstanceNorm normalization+ReLU activation function layer is a Conv4 convolution layer, the number of convolution kernels in the Conv4 convolution layer is 64, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1.

The convolution layer in the first Conv+InstanceNorm normalization+ReLU activation function layer in the third scale network model is a Conv3 convolution layer, the number of convolution kernels in the Conv3 convolution layer is 224, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1;

The convolution layer in the second Conv+InstanceNorm normalization+ReLU activation function layer is Conv4 convolution layer, the number of convolution kernels in the Conv4 convolution layer is 128, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1.

The convolution layer in the first Conv+InstanceNorm normalization+ReLU activation function layer in the fourth scale network model is a Conv3 convolution layer, the number of convolution kernels in the Conv3 convolution layer is 480, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1;

the convolution layer in the second Conv+InstanceNorm normalization+ReLU activation function layer is a Conv4 convolution layer, the number of convolution kernels in the Conv4 convolution layer is 256, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1.

The convolution layer in the first Conv+InstanceNorm normalization+ReLU activation function layer in the second decoding network model is a Conv3 convolution layer, the number of convolution kernels in the Conv3 convolution layer is 384, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1;

the convolution layer in the second Conv+Ins tanceNorm normalization+ReLU activation function layer is a Conv4 convolution layer, the number of convolution kernels in the Conv4 convolution layer is 128, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1;

the convolution layer in the first Conv+Ins tanceNorm normalization+ReLU activation function layer in the third decoding network model is a Conv3 convolution layer, the number of convolution kernels in the Conv3 convolution layer is 448, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1;

The convolution layer in the second Conv+Ins tanceNorm normalization+ReLU activation function layer is Conv4 convolution layer, the number of convolution kernels in the Conv4 convolution layer is 64, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1.

The convolution layer in the first Conv+Ins tanceNorm normalization+ReLU activation function layer in the fourth decoding network model is a Conv3 convolution layer, the number of convolution kernels in the Conv3 convolution layer is 480, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1;

the convolution layer in the second Conv+Ins tanceNorm normalization+ReLU activation function layer is Conv4 convolution layer, the number of convolution kernels in the Conv4 convolution layer is 32, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1.

In this embodiment, the PA module is a spatial attention mechanism module, and the RDB is a residual error density block (Res idual Dense Block).

In this embodiment, it should be noted that the Adam optimization algorithm, i.e. Adapt ive momentum optimization algorithm, is a first-order optimization algorithm that can replace the conventional random gradient descent process, and can iteratively update the model parameters based on training data.

In this embodiment, the number of the foggy training images and the number of the foggy training images are 13990.

In this embodiment, it should be noted that, in actual use,refers to Hadamard product between image matrices, for example, let the ith row and jth column elements in matrix A be ai j, let the ith row and jth column elements in matrix B be bi j, then->The ith row and jth column element in C is cij=aij×bi j and A, B and C are homography matrices.

In this embodiment, it should be noted that the preset number of iterative training in step 502 is 30.

In this embodiment, when i=1, Φ ¹ (gt) a characteristic diagram showing the output of the foggless training image gt through the Relu1_1 layer in the VGG19 network model, phi ¹ (out) a feature map representing the output defogging image out of the student network model through the Relu1_1 layer output in the VGG19 network model;

when i=2, then Φ ² (gt) a characteristic diagram showing the output of the foggless training image gt through the Relu2_1 layer in the VGG19 network model, phi ² (out) a feature map representing the output defogging image out of the student network model through the Relu2_1 layer output in the VGG19 network model;

i=3, then Φ ³ (gt) a characteristic diagram showing the output of the foggless training image gt through the Relu3_1 layer in the VGG19 network model, phi ³ (out) a feature map representing the output defogging image out of the student network model through the Relu3_1 layer output in the VGG19 network model;

When i=4, then Φ ⁴ (gt) represents the output of the fogless training image gt through the Relu4_1 layer in the VGG19 network modelIs characterized by phi ⁴ (out) a feature map representing the output defogging image out of the student network model through the Relu4_1 layer output in the VGG19 network model;

when i=5, then Φ ⁵ (gt) a characteristic diagram showing the output of the foggless training image gt through the Relu5_1 layer in the VGG19 network model, phi ⁵ (out) represents a feature map of the output defogging image out of the student network model output through the Relu5_1 layer in the VGG19 network model.

In this embodiment, the downsampling is nearest neighbor downsampling, and the upsampling is nearest neighbor upsampling.

In this embodiment, the number of channels of the image is unchanged as the downsampling by 0.5 times, and the size of the image is changed to 1/2 of the original size; downsampling by 0.25 times, namely, the channel number of the image is unchanged, and the size of the image is changed to 1/4 of the original size; the downsampling is 0.125 times, namely the channel number of the image is unchanged, and the size of the image is changed to 1/8 of the original size.

In this embodiment, it should be noted that, the number of channels of the image is unchanged as the up-sampling is 2 times, and the size of the image is changed to 2 as the original size; 4 times up sampling, namely the channel number of the image is unchanged, and the size of the image is changed into 4; the 8 times up sampling is that the channel number of the image is unchanged, and the size of the image is changed to 8.

In this embodiment, the foggy training image I is a three-channel RGB color image, i.e., 3×256×256. The size of the fog monitoring image is 3×256×256.

In this embodiment, the EPDN teacher network outputs defogging image out _EP And PSD teacher network output defogging image out _PS Is 3×256×256, and the epdn teacher network intermediate output feature map EP ₁ The size of the PSD teacher network intermediate output feature map PS is 128 multiplied by 64 ₂ Is 64 x 128.

In the present embodiment, feature map F _pre The size of (2) is 32×256×256, and the feature map F _RDB The size of (2) is 32×256×256, and the feature map F _mid The size of (2) is 32×256×256, and the feature map F _s The size of (2) is 32×256×256.

In the present embodiment, the size of the feature map is represented by the number of channels×the length×the width, and is inputFeature map F _in The size of (2) is 32×256×256, and the feature map F is output _out Is 32 x 256, the first scale feature map F _e1 The size of (2) is 32×256×256;

the second input feature map has a size of 64×128×128, the second-scale first coding feature map has a size of 64×128×128, the second-scale second coding feature map has a size of 64×128×128, the first downsampled feature map has a size of 32×128×128, the first post-splice feature map has a size of 96×128×128, and the second-scale feature map F _e2 The size of (2) is 64×128×128;

the third input feature map has a size of 128 x 64, the third scale first encoding feature map has a size of 128 x 064 x 164, the third scale second encoding feature map has a size of 128 x 64, the second downsampled feature map has a size of 64 x 64, the third downsampled feature map has a size of 32 x 64, the size of the second spliced characteristic diagram is 224 multiplied by 64, and the third scale characteristic diagram F _e3 The size of (2) is 128×64×64;

the fourth input feature map has a size of 256 x 32, the fourth scale first encoding feature map has a size of 256 x 032 x 132, the fourth scale second encoding feature map has a size of 256 x 232 x 332, the fourth downsampling feature map has a size of 128 x 32, the fifth downsampled feature map has a size of 64 x 32, the sixth downsampled feature map has a size of 32 x 32, the size of the third spliced characteristic diagram is 480 multiplied by 32, and the fourth scale characteristic diagram F _e4 The size of (2) is 256×32×32.

In the present embodiment, a first pre-coding feature map and a first coding feature map F _d1 The size of (2) is 256×32×32;

the first decoding first up-sampling feature map has a size of 128×64×64, the first intermediate feature map has a size of 128×64×64, the first decoding second up-sampling feature map has a size of 256×64×64, the first decoding splice feature map has a size of 384×64×64, and the second decoding feature map F _d2 The size of (2) is 128×64×64;

the second decoded first upsampled feature map has a size of 64×128×128, the second intermediate feature map has a size of 64×128×128, and the second decoded second upsampled featureThe size of the map is 256 x 128, the size of the second decoded third upsampled feature map is 128 x 128, the second decoding spliced characteristic diagram has the size of 448 multiplied by 128, and the third decoding characteristic diagram F _d3 The size of (2) is 64×128×128;

the size of the third decoded first upsampled feature map is 32 x 256, the size of the third intermediate feature map is 32 x 0256 x 1256, the size of the third decoded second upsampled feature map is 256 x 256, the size of the third up-sampling feature map of the third decoding is 128×256×256, the size of the fourth up-sampling feature map of the third decoding is 64×256×256, the size of the third decoding splice feature map is 480×256×256, and the fourth decoding feature map F _d4 The size of (2) is 32×256×256;

the size of the output defogging image out is 3×256×256.

In this embodiment, it should be noted that, the first decoding concatenation feature map, the second decoding concatenation feature map, and the third decoding concatenation feature map are subjected to feature processing by a first conv+instancenorm normalization+relu activation function layer and a second conv+instancenorm normalization+relu activation function layer in the feature fusion module, respectively, to obtain a second decoding feature map, a third decoding feature map, and a fourth decoding feature map.

The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and any simple modification, variation and equivalent structural changes made to the above embodiment according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims

1. A single image defogging method based on multi-teacher knowledge distillation, which is characterized by comprising the following steps:

step one, acquiring a training set image:

Step two, establishing a student network model:

Step three, extracting features of the foggy training images:

step 301, using a computer to storeThe fog training image I is subjected to feature extraction through a first scale network model to obtain a first scale feature map F _e1 ；

Processing the foggy training image I by using a computer and utilizing a teacher PSD network model to obtain a PSD teacher network output defogging image out _PS The feature map output by the main network in the PSD network model of the teacher is recorded as a PSD teacher network middle output feature map PS ₂ ；

Step four, establishing a total loss function:

Training the student network model by the foggy training image:

2. A single image defogging method based on multi-teacher knowledge distillation according to claim 1, wherein: in step 201, the number of convolution kernels in the first convolution layer is 32, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1;

3. A single image defogging method based on multi-teacher knowledge distillation according to claim 1, wherein: in step 301, a computer is used to perform feature extraction on the foggy training image I through a first scale network model to obtain a first scale feature map F _e1 The specific process is as follows:

4. A single image defogging method based on multi-teacher knowledge distillation according to claim 1, wherein: step 302 uses a computer to apply a first rulerDegree feature map F _e1 Feature extraction is carried out through a second scale network model to obtain a second scale feature map F _e2 The specific process is as follows:

5. A single image defogging method based on multi-teacher knowledge distillation according to claim 3, characterized in that: the computer will input a feature map F in step 3012 _in Inputting a PA-based RDB module to perform feature extraction to obtain an intermediate feature map F _out The specific process is as follows:

step D, the computer is according toObtaining an intermediate feature map F _out 。

6. A single image defogging method based on multi-teacher knowledge distillation according to claim 3, characterized in that: in step 3026, step 3036, and step 3046, the first post-stitching feature map, the second post-stitching feature map, and the third post-stitching feature map are recorded as post-stitching feature maps, and the second scale feature map F _e2 Third scale feature map F _e3 And fourth scale feature map F _e4 And (3) respectively recording the characteristic images as fused scale characteristic images, and inputting the spliced characteristic images into a characteristic fusion module by adopting a computer to obtain the scale characteristic images, wherein the specific process is as follows:

7. A single image defogging method based on multi-teacher knowledge distillation according to claim 1, wherein: computer-implemented fourth scale feature map F in step 305 _e4 Feature extraction is carried out through a first decoding network model to obtain a first decoding feature map F _d1 The specific process is as follows:

Step 306, the specific process is as follows:

Step 307, the specific process is as follows:

Step 308, the specific process is as follows: