CN116862784B

CN116862784B - Single image defogging method based on multi-teacher knowledge distillation

Info

Publication number: CN116862784B
Application number: CN202310681883.6A
Authority: CN
Inventors: 兰云伟; 崔智高; 苏延召; 马铮; 蔡艳平; 王涛; 曹继平
Original assignee: Rocket Force University of Engineering of PLA
Current assignee: Rocket Force University of Engineering of PLA
Priority date: 2023-06-09
Filing date: 2023-06-09
Publication date: 2024-06-04
Anticipated expiration: 2043-06-09
Also published as: CN116862784A

Abstract

The invention discloses a single image defogging method based on multi-teacher knowledge distillation, which comprises the following steps: 1. acquiring a training set image; 2. establishing a student network model; 3. extracting features of the foggy training images; 4. establishing a total loss function; 5. training the student network model by the foggy training image; 6. defogging the single image by using the trained student network model. According to the invention, the student network model is guided and trained through the EPDN teacher network model and the PSD teacher network model, so that the feature extraction capability of the student network is effectively improved, the student network model realizes the extraction of multi-scale information of defogging images through the encoding and decoding of four scales, the global and local features of the defogging images are effectively fused, and the defogging effect of the images is further improved.

Description

Single image defogging method based on multi-teacher knowledge distillation

Technical Field

The invention belongs to the technical field of image defogging processing, and particularly relates to a single image defogging method based on multi-teacher knowledge distillation.

Background

At present, a teacher model in an image defogging method mainly comprises a defogging method based on prior information and a defogging method based on deep learning. The image defogging method based on prior information has advantages in the aspects of recovering the visibility, contrast and texture structure of the image, and the image defogging method based on deep learning has better effects in the aspects of improving the authenticity and color fidelity of the image. However, at present, knowledge learned by a single teacher model is generally transferred to a student model, so that the student model has similar performance to the teacher model, but the trained student model is often limited by the performance of the teacher model due to the fact that the single teacher model is adopted to carry out unidirectional knowledge transfer on a student network.

Therefore, a single image defogging method based on multi-teacher knowledge distillation, which is simple in structure and reasonable in design, is lacking at present, a student network model is trained through a EPDN teacher network model and a PSD teacher network model, the feature extraction capability of the student network is effectively improved, the student network model is used for extracting multi-scale information of defogging images through four-scale encoding and decoding, global and local features of defogging images are effectively fused, and then the defogging effect of the images is improved.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a single image defogging method based on multi-teacher knowledge distillation, which has simple steps and reasonable design, guides and trains a student network model through a EPDN teacher network model and a PSD teacher network model, effectively improves the characteristic extraction capability of the student network, realizes the extraction of multi-scale information of defogging images through the four-scale encoding and decoding of the student network model, effectively fuses global and local characteristics of defogging images, and further improves the defogging effect of the images.

In order to solve the technical problems, the invention adopts the following technical scheme: a single image defogging method based on multi-teacher knowledge distillation, which is characterized by comprising the following steps:

step one, acquiring a training set image:

selecting an indoor training set from the foggy day image database RESIDE; the indoor training set comprises foggy training images and foggy training images corresponding to the foggy training images, wherein the number of the foggy training images and the number of the foggy training images are the same;

step two, establishing a student network model:

The method for establishing the student network model comprises the following specific processes:

Step 201, establishing an encoder model of a student network by adopting a computer; the encoder model of the student network comprises a first scale network model, a second scale network model, a third scale network model and a fourth scale network model, wherein the first scale network model comprises a first convolution layer and two RDB modules based on PA, and the second scale network model comprises a second convolution layer, two RDB modules based on PA and a feature fusion module; the third scale network model comprises a third convolution layer, two RDB modules based on PA and a feature fusion module; the fourth scale network model comprises a fourth convolution layer, two RDB modules based on PA and a feature fusion module;

Step 202, adopting a computer to establish a decoder model of the student network; the decoder model of the student network comprises a first decoding network model, a second decoding network model, a third decoding network model, a fourth decoding network model and a fifth convolution layer, wherein the first decoding network model comprises two RDB modules based on PA, and the second decoding network model comprises a first transfer convolution layer, two RDB modules based on PA and a feature fusion module; the third decoding network model comprises a second transpose convolution layer, two RDB modules based on PA and a feature fusion module; the fourth decoding network model comprises a third transpose convolution layer, two RDB modules based on PA and a feature fusion module;

step three, extracting features of the foggy training images:

Step 301, extracting features of the foggy training image I through a first scale network model by adopting a computer to obtain a first scale feature map F _e1;

Step 302, extracting features of the first scale feature map F _e1 through a second scale network model by using a computer to obtain a second scale feature map F _e2;

step 303, extracting features of the second scale feature map F _e2 through a third scale network model by using a computer to obtain a third scale feature map F _e3;

step 304, extracting features of the third scale feature map F _e3 through a fourth scale network model by using a computer to obtain a fourth scale feature map F _e4;

step 305, performing feature extraction on the fourth scale feature map F _e4 through the first decoding network model by using a computer to obtain a first decoding feature map F _d1;

Step 306, extracting features of the first decoding feature map F _d1 through a second decoding network model by using a computer to obtain a second decoding feature map F _d2;

Step 307, performing feature extraction on the second decoding feature map F _d2 through a third decoding network model by using a computer to obtain a third decoding feature map F _d3;

Step 308, performing feature extraction on the third decoding feature map F _d3 through a fourth decoding network model by using a computer to obtain a fourth decoding feature map F _d4; performing feature extraction on the fourth decoding feature map F _d4 through fifth convolution by adopting a computer to obtain an output defogging image out;

309, processing the foggy training image I by using a EPDN teacher network model by using a computer to obtain a EPDN teacher network output defogging image out _EP, and recording a feature map output by a global sub-generator in the EPDN teacher network model as a EPDN teacher network intermediate output feature map EP ₁;

Processing the foggy training image I by a computer by using a teacher PSD network model to obtain a PSD teacher network output defogging image out _PS, and recording a feature map output by a trunk network in the teacher PSD network model as a PSD teacher network middle output feature map PS ₂;

Step four, establishing a total loss function:

step 401, adopting a computer according to Obtaining a perception loss function L _per; wherein I is a positive integer, n=5, Φ ⁱ (gt) represents a characteristic diagram of an defogging training image corresponding to the foggy training image I, which is output by Relu I _1 layer in the VGG19 network model, Φ ⁱ (out) represents a characteristic diagram of an output defogging image out of the student network model, which is output by Relu I _1 layer in the VGG19 network model, and I is more than or equal to 1 and less than or equal to 5; c _i、H_i and W _i represent the number of channels, length and width of the feature map output by the Relu i _1 layer, respectively; (Φ ⁱ(gt),Φⁱ(out))_L1 represents the Manhattan distance between the two feature maps of Relu i _1 layer output in the VGG19 network model;

step 402, obtaining a distillation loss function L _diss according to L_dist＝(out,out_EP)_L1+(out,out_PS)_L1+0.25(EP₁,F_d2)_L1+0.5(PS₂,F_d3)_L1, by adopting a computer; wherein, (out _EP)_L1 represents the manhattan distance between the output defogging image out of the student network model and the EPDN teacher network output defogging image out _EP, (out _PS)_L1 represents the manhattan distance between the output defogging image out of the student network model and the PSD teacher network output defogging image out _PS, (EP ₁,F_d2)_L1 represents the manhattan distance between the EPDN teacher network intermediate output feature map EP ₁ and the second decoding feature map F _d2 of the student network model, (PS ₂,F_d3)_L1 represents the manhattan distance between the PSD teacher network intermediate output feature map PS ₂ and the third decoding feature map F _d3 of the student network model);

Step 403, obtaining a total loss function L _loss by adopting a computer according to the L _loss＝0.1L_per+L_dist;

Training the student network model by the foggy training image:

step 501, adopting an Adam optimization algorithm by a computer, and performing iterative optimization on a student network model by using a total loss function L _loss until a training set is completely trained, and completing one-time iterative training;

step 502, repeating the iterative training in step 501 until the iterative training preset times are met, and obtaining a trained student network model;

Step six, defogging the single image by using the trained student network model:

And inputting any one foggy image into a trained student network model by adopting a computer to perform defogging treatment, so as to obtain a foggy image.

The single image defogging method based on multi-teacher knowledge distillation is characterized by comprising the following steps of: in step 201, the number of convolution kernels in the first convolution layer is 32, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1;

The number of convolution kernels in the second convolution layer is 64, the size of the convolution kernels is 3×3, the sliding step size is 2, and the padding is 1;

The PA-based RDB module in step 201 includes a first conv+relu layer, a Conv1 convolution layer, an RDB module, a Conv2 convolution layer, and a Sigmoid activation function layer; the number of convolution kernels in the first Conv+ReLU layer is 32, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1; the number of convolution kernels in the Conv1 convolution layer is 32, the size of the convolution kernels is 1 multiplied by 1, the sliding step length is 1, and the padding is 0; the number of convolution kernels in the Conv2 convolution layer is 32, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1;

the number of convolution kernels in the third convolution layer is 128, the size of the convolution kernels is 3×3, the sliding step length is 2, and the padding is 1;

The number of convolution kernels in the fourth convolution layer is 256, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 2, and the padding is 1;

The feature fusion module in step 201 includes a first conv+ InstanceNorm normalization+relu activation function layer and a second conv+ InstanceNorm normalization+relu activation function layer;

in step 202, the number of convolution kernels in the first transpose convolution layer is 128, the size of the convolution kernels is 3×3, the sliding step size is 2, the padding is1, and the out_padding is 1;

the number of convolution kernels in the second transpose convolution layer is 64, the size of the convolution kernels is 3×3, the sliding step size is 2, the padding is 1, and the out_padding is 1;

the number of convolution kernels in the third transpose convolution layer is 32, the size of the convolution kernels is 3×3, the sliding step size is 2, the padding is 1, and the out_padding is 1;

The number of convolution kernels in the fifth convolution layer is 3, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1.

The single image defogging method based on multi-teacher knowledge distillation is characterized by comprising the following steps of: in step 301, a computer is used to perform feature extraction on the foggy training image I through a first scale network model, so as to obtain a first scale feature map F _e1, which specifically includes the following steps:

Step 3011, performing feature extraction on the foggy training image I through a first convolution layer by adopting a computer to obtain an input feature map F _in;

Step 3012, inputting the input feature map F _in into a PA-based RDB module by a computer to perform feature extraction, so as to obtain an intermediate output feature map F _out;

Step 3013, according to the method described in step 3012, the computer inputs the intermediate output feature map F _out into another PA-based RDB module for feature extraction, to obtain a first scale feature map F _e1.

The single image defogging method based on multi-teacher knowledge distillation is characterized by comprising the following steps of: in step 302, a computer is used to perform feature extraction on the first scale feature map F _e1 through a second scale network model to obtain a second scale feature map F _e2, which specifically includes the following steps:

Step 3021, performing feature extraction on the first scale feature map F _e1 through a second convolution layer by using a computer to obtain a second input feature map;

Step 3022, inputting the second input feature map into a PA-based RDB module in the second scale network model by the computer to perform feature extraction, so as to obtain a second scale first coding feature map;

Step 3023, inputting the second-scale first coding feature map into another PA-based RDB module in the second-scale network model by the computer for feature extraction, so as to obtain a second-scale second coding feature map;

Step 3024, the computer downsamples the first scale feature map F _e1 by 0.5 times to obtain a first downsampled feature map;

step 3025, calling a splicing cat function module by a computer to splice the first downsampling feature map and the second-scale second coding feature map to obtain a first spliced feature map;

Step 3026, inputting the first spliced feature map into a feature fusion module in the second scale network model by using a computer to obtain a second scale feature map F _e2;

In step 303, a computer is used to perform feature extraction on the second scale feature map F _e2 through a third scale network model to obtain a third scale feature map F _e3, which specifically includes the following steps:

step 3031, a computer is adopted to conduct feature extraction on the second scale feature map F _e2 through a third convolution layer to obtain a third input feature map;

step 3032, the computer inputs the third input feature map into a PA-based RDB module in the third-scale network model to perform feature extraction to obtain a third-scale first coding feature map;

step 3033, the computer inputs the third-scale first decoding feature map into another RDB module based on PA in the third-scale network model to perform feature extraction, so as to obtain a third-scale second coding feature map;

step 3034, the computer performs 0.5 times downsampling on the second scale feature map F _e2 to obtain a second downsampled feature map;

The computer performs 0.25 times downsampling on the first scale feature map F _e1 to obtain a third downsampled feature map;

Step 3035, a computer is adopted to call a splicing cat function module to splice the second downsampling feature map, the third downsampling feature map and the third-scale second coding feature map to obtain a second spliced feature map;

Step 3036, inputting the second spliced feature map into a feature fusion module in the third-scale network model by adopting a computer to obtain a third-scale feature map F _e3;

In step 304, a computer is adopted to extract the features of the third scale feature map F _e3 through a fourth scale network model to obtain a fourth scale feature map F _e4, which specifically comprises the following steps:

step 3041, performing feature extraction on the third scale feature map F _e3 through a fourth convolution layer by adopting a computer to obtain a fourth input feature map;

Step 3042, inputting the fourth input feature map into a PA-based RDB module in a fourth-scale network model by a computer to perform feature extraction, so as to obtain a fourth-scale first coding feature map;

step 3043, inputting the fourth-scale first coding feature map into another RDB module based on PA in the fourth-scale network model by a computer for feature extraction to obtain a fourth-scale second coding feature map;

step 3044, the computer performs 0.5 times downsampling on the third scale feature map F _e3 to obtain a fourth downsampled feature map;

The computer performs 0.25 times downsampling on the second scale feature map F _e2 to obtain a fifth downsampled feature map;

The computer performs 0.125 times downsampling on the first scale feature map F _e1 to obtain a sixth downsampled feature map;

Step 3045, calling a splicing cat function module by a computer to splice the fourth downsampling feature map, the fifth downsampling feature map, the sixth downsampling feature map and the fourth-scale second coding feature map to obtain a third spliced feature map;

And step 3046, inputting the third spliced feature map into a feature fusion module in the fourth-scale network model by adopting a computer to obtain a fourth-scale feature map F _e4.

The single image defogging method based on multi-teacher knowledge distillation is characterized by comprising the following steps of: in step 3012, the computer inputs the input feature map F _in into a PA-based RDB module to perform feature extraction to obtain an intermediate feature map F _out, which specifically includes the following steps:

Step A, a computer performs feature extraction on an input feature map F _in through a first Conv+ReLU layer to obtain a feature map F _pre;

Step B, a computer inputs a characteristic diagram F _pre into a Conv1 convolution layer and an RDB module to perform characteristic extraction to obtain a characteristic diagram F _RDB, and simultaneously, inputs a characteristic diagram F _pre into a Conv2 convolution layer to perform convolution processing and normalizes the characteristic diagram through a Sigmoid activation function to obtain a space weight diagram F _s;

Step C, the computer is according to Obtaining a characteristic diagram F _mid; wherein/>Hadamard product operation between matrices representing feature maps,/>Representing addition operations between feature map matrices;

step D, the computer is according to Obtaining an intermediate feature map F _out;

The single image defogging method based on multi-teacher knowledge distillation is characterized by comprising the following steps of: in step 3026, step 3036 and step 3046, the first post-stitching feature map, the second post-stitching feature map and the third post-stitching feature map are recorded as post-stitching feature maps, and the second scale feature map F _e2, the third scale feature map F _e3 and the fourth scale feature map F _e4 are all recorded as post-fusion scale feature maps, and then the post-stitching feature maps are input into a feature fusion module by a computer to obtain a scale feature map, which comprises the following specific steps:

A1, performing feature processing on the spliced feature map through a first Conv+ InstanceNorm normalization and ReLU activation function layer by adopting a computer to obtain a fusion coding feature map;

and A2, performing feature processing on the fusion coding feature map through a second Conv+ InstanceNorm normalization and ReLU activation function layer by adopting a computer to obtain a fused scale feature map.

The single image defogging method based on multi-teacher knowledge distillation is characterized by comprising the following steps of: in step 305, a computer is used to extract features of the fourth scale feature map F _e4 through the first decoding network model to obtain a first decoding feature map F _d1, which specifically includes the following steps:

step 3051, performing feature extraction on the fourth scale feature map F _e4 by using a computer through a PA-based RDB module in the first decoding network model to obtain a first pre-decoding feature map;

Step 3052, inputting the first pre-decoding feature map into another PA-based RDB module in the first decoding network model by the computer to perform feature extraction, so as to obtain a first decoding feature map F _d1;

step 306, the specific process is as follows:

Step 3061, performing feature extraction on the first decoded feature map F _d1 through a first transposed convolutional layer by using a computer to obtain a first decoded first upsampled feature map;

Step 3062, performing feature extraction on the first decoded first upsampled feature map through two PA-based RDB modules in the second decoding network model by using a computer to obtain a first intermediate feature map;

step 3063, performing 2 times up-sampling processing on the first decoding feature map F _d1 by adopting a computer to obtain a first decoding second up-sampling feature map;

Step 3064, a computer is adopted to call a splicing cat function module to splice the first intermediate feature map and the first decoding second up-sampling feature map, so as to obtain a first decoding splicing feature map;

step 3065, inputting the first decoding spliced feature map into a feature fusion module in a second decoding network model by adopting a computer to obtain a second decoding feature map F _d2;

step 307, the specific process is as follows:

Step 3071, using a computer to perform feature extraction on the second decoded feature map F _d2 through a second transposed convolutional layer to obtain a second decoded first upsampled feature map;

step 3072, performing feature extraction on the second decoded first upsampled feature map by using a computer through two PA-based RDB modules in the third decoding network model to obtain a second intermediate feature map;

3073, performing 4 times up-sampling on the first decoding feature map F _d1 by adopting a computer to obtain a second decoding second up-sampling feature map;

performing 2 times up-sampling on the second decoding feature map F _d2 to obtain a second decoding third up-sampling feature map;

Step 3074, calling a splicing cat function module by a computer to splice the second intermediate feature map, the second decoding second up-sampling feature map and the second decoding third up-sampling feature map to obtain a second decoding splicing feature map;

Step 3075, inputting the second decoding spliced feature map into a feature fusion module in a third decoding network model by adopting a computer to obtain a third decoding feature map F _d3;

Step 308, the specific process is as follows:

Step 3081, performing feature extraction on the third decoded feature map F _d3 through a third transposed convolutional layer by using a computer to obtain a third decoded first upsampled feature map;

Step 3072, performing feature extraction on the third decoded first upsampled feature map by using a computer through two PA-based RDB modules in the fourth decoding network model to obtain a third intermediate feature map;

3073, performing 8 times up-sampling on the first decoding feature map F _d1 by adopting a computer to obtain a third decoding second up-sampling feature map;

performing 4 times up-sampling on the second decoding feature map F _d2 to obtain a third up-sampling feature map for third decoding;

Performing 2 times up-sampling on the third decoding feature map F _d3 to obtain a third decoding fourth up-sampling feature map;

Step 3074, calling a splicing cat function module by a computer to splice the third intermediate feature map, the third decoding second up-sampling feature map, the third decoding third up-sampling feature map and the third decoding fourth up-sampling feature map to obtain a third decoding splicing feature map;

And 3075, inputting the third decoding spliced feature map into a feature fusion module in a third decoding network model by adopting a computer to obtain a fourth decoding feature map F _d4.

Compared with the prior art, the invention has the following advantages:

1. The method has simple steps and reasonable design, and firstly, the training set image is acquired; secondly, establishing a student network model, extracting features of a foggy training image, establishing a total loss function, training the student network model by the foggy training image, defogging a single image by using the trained student network model, and improving the defogging effect of the image.

2. The student network model adopts the feature attention residual error dense block to carry out multi-scale feature extraction to generate the end-to-end feature image, thereby utilizing the advantages of the neural network and having stronger generalization capability.

3. The encoder model in the student network model comprises a first scale network model, a second scale network model, a third scale network model and a fourth scale network model, the decoder model comprises a first decoding network model, a second decoding network model, a third decoding network model and a fourth decoding network model, the four-scale downsampling characteristic extraction is realized through the encoder model, the four-scale upsampling characteristic extraction is realized through the decoder model, the extraction of multi-scale information of a defogging image is realized, the global and local characteristics of the defogging image are effectively fused, and the defogging effect of the image is further improved.

4. According to the invention, by adopting EPDN teacher network model and PSD teacher network model, knowledge migration from the teacher network to the student network is realized in a multi-teacher knowledge distillation mode, so that the student network can combine the complementary advantages of the image defogging method based on prior information and the image defogging method based on deep learning.

In summary, the method has simple steps and reasonable design, the student network model is guided and trained through the EPDN teacher network model and the PSD teacher network model, the feature extraction capability of the student network is effectively improved, the student network model realizes the extraction of multi-scale information of the defogging image through the encoding and decoding of four scales, the global and local features of the defogging image are effectively fused, and the defogging effect of the image is further improved.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a block flow diagram of the method of the present invention.

Fig. 2 is a schematic diagram of the structure of the student network model of the present invention.

Fig. 3 is a schematic structural diagram of a residual error density block according to a feature of the present invention.

Fig. 4 is a schematic structural diagram of a feature fusion module according to the present invention.

Detailed Description

As shown in fig. 1 to 4, the single image defogging method based on multi-teacher knowledge distillation of the present invention comprises the following steps:

step one, acquiring a training set image:

step two, establishing a student network model:

step three, extracting features of the foggy training images:

Step four, establishing a total loss function:

Training the student network model by the foggy training image:

In this embodiment, in step 201, the number of convolution kernels in the first convolution layer is 32, the size of the convolution kernel is 3×3, the sliding step size is 1, and the padding is 1;

In this embodiment, in step 301, a computer is used to perform feature extraction on the foggy training image I through a first scale network model to obtain a first scale feature map F _e1, which specifically includes the following steps:

In this embodiment, in step 302, a computer is used to perform feature extraction on the first scale feature map F _e1 through a second scale network model to obtain a second scale feature map F _e2, which specifically includes the following steps:

In this embodiment, in step 3012, the computer inputs the input feature map F _in into a PA-based RDB module to perform feature extraction to obtain an intermediate feature map F _out, which specifically includes the following steps:

In this embodiment, in step 3026, step 3036 and step 3046, the first post-stitching feature map, the second post-stitching feature map and the third post-stitching feature map are recorded as post-stitching feature maps, and the second scale feature map F _e2, the third scale feature map F _e3 and the fourth scale feature map F _e4 are all recorded as post-fusion scale feature maps, and then the post-stitching feature maps are input into a feature fusion module by a computer to obtain a scale feature map, which comprises the following specific steps:

In this embodiment, in step 305, a computer is used to perform feature extraction on the fourth scale feature map F _e4 through the first decoding network model to obtain a first decoding feature map F _d1, which specifically includes the following steps:

step 306, the specific process is as follows:

step 307, the specific process is as follows:

Step 308, the specific process is as follows:

In this embodiment, it should be noted that the structures of the feature fusion modules in the second scale network model, the third scale network model and the fourth scale network model are the same and only the number of convolution kernels is different.

In this embodiment, it should be noted that the structures of the feature fusion modules in the second decoding network model, the third decoding network model, and the fourth decoding network model are the same and only the number of convolution kernels is different.

In this embodiment, the convolution layer in the first conv+ InstanceNorm normalization+relu activation function layer in the second scale network model is a Conv3 convolution layer, the number of convolution kernels in the Conv3 convolution layer is 96, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1;

The convolution layer in the second Conv+ InstanceNorm normalization+ReLU activation function layer is a Conv4 convolution layer, the number of convolution kernels in the Conv4 convolution layer is 64, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1.

The convolution layer in the first Conv+ InstanceNorm normalization+ReLU activation function layer in the third scale network model is a Conv3 convolution layer, the number of convolution kernels in the Conv3 convolution layer is 224, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1;

The convolution layer in the second Conv+ InstanceNorm normalization+ReLU activation function layer is a Conv4 convolution layer, the number of convolution kernels in the Conv4 convolution layer is 128, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1.

The convolution layer in the first Conv+ InstanceNorm normalization+ReLU activation function layer in the fourth scale network model is a Conv3 convolution layer, the number of convolution kernels in the Conv3 convolution layer is 480, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1;

The convolution layer in the second Conv+ InstanceNorm normalization+ReLU activation function layer is a Conv4 convolution layer, the number of convolution kernels in the Conv4 convolution layer is 256, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1.

The first Conv+ InstanceNorm normalization+ReLU activation function layer in the second decoding network model is a Conv3 convolution layer, the number of convolution kernels in the Conv3 convolution layer is 384, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1;

The convolution layer in the second Conv+ InstanceNorm normalization+ReLU activation function layer is a Conv4 convolution layer, the number of convolution kernels in the Conv4 convolution layer is 128, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1;

The first Conv+ InstanceNorm normalization+ReLU activation function layer in the third decoding network model is a Conv3 convolution layer, the number of convolution kernels in the Conv3 convolution layer is 448, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1;

The convolution layer in the first Conv+ InstanceNorm normalization+ReLU activation function layer in the fourth decoding network model is a Conv3 convolution layer, the number of convolution kernels in the Conv3 convolution layer is 480, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1;

The convolution layer in the second Conv+ InstanceNorm normalization+ReLU activation function layer is a Conv4 convolution layer, the number of convolution kernels in the Conv4 convolution layer is 32, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1.

In this embodiment, the PA module is a spatial attention mechanism module, and the RDB is a residual error density block (Residual Dense Block).

In this embodiment, it should be noted that the Adam optimization algorithm, that is, adaptive momentum optimization algorithm, is a first-order optimization algorithm that can replace the conventional random gradient descent process, and can iteratively update the model parameters based on training data.

In this embodiment, the number of the foggy training images and the number of the foggy training images are 13990.

In this embodiment, it should be noted that, in actual use,Refers to Hadamard product between image matrices, for example, let the ith row and jth column elements in matrix A be aij and the ith row and jth column elements in matrix B be bij, then/>The ith row and jth column element in C is cij=aij×bij and A, B and C are homography matrices.

In this embodiment, it should be noted that the preset number of iterative training in step 502 is 30.

In this embodiment, it should be noted that, when i=1, Φ ¹ (gt) represents a feature map of the defogging training image gt output through Relu1_1 layer in the VGG19 network model, and Φ ¹ (out) represents a feature map of the defogging image out output through Relu1_1 layer in the VGG19 network model;

When i=2, Φ ² (gt) represents a feature map of the defogging training image gt output through Relu2_1 layer in the VGG19 network model, Φ ² (out) represents a feature map of the defogging image out output through Relu2_1 layer in the VGG19 network model;

When i=3, Φ ³ (gt) represents a feature map of the defogging training image gt output through Relu3_1 layer in the VGG19 network model, Φ ³ (out) represents a feature map of the defogging image out output through Relu3_1 layer in the VGG19 network model;

When i=4, Φ ⁴ (gt) represents a feature map of the defogging training image gt output through Relu4_1 layer in the VGG19 network model, and Φ ⁴ (out) represents a feature map of the defogging image out output through Relu4_1 layer in the VGG19 network model;

When i=5, Φ ⁵ (gt) represents a feature map of the defogging training image gt output through Relu5_1 layer in the VGG19 network model, and Φ ⁵ (out) represents a feature map of the defogging image out output through Relu5_1 layer in the VGG19 network model.

In this embodiment, the downsampling is nearest neighbor downsampling, and the upsampling is nearest neighbor upsampling.

In this embodiment, the number of channels of the image is unchanged as the downsampling by 0.5 times, and the size of the image is changed to 1/2 of the original size; downsampling by 0.25 times, namely, the channel number of the image is unchanged, and the size of the image is changed to 1/4 of the original size; the downsampling is 0.125 times, namely the channel number of the image is unchanged, and the size of the image is changed to 1/8 of the original size.

In this embodiment, it should be noted that, the number of channels of the image is unchanged as the up-sampling is 2 times, and the size of the image is changed to 2 as the original size; 4 times up sampling, namely the channel number of the image is unchanged, and the size of the image is changed into 4; the 8 times up sampling is that the channel number of the image is unchanged, and the size of the image is changed to 8.

In this embodiment, the foggy training image I is a three-channel RGB color image, i.e., 3×256×256. The size of the fog monitoring image is 3×256×256.

In this embodiment, the size of the EPDN teacher network output defogging image out _EP and the size of the PSD teacher network output defogging image out _PS are 3×256×256, the size of the epdn teacher network intermediate output feature map EP ₁ is 128×64×64, and the size of the PSD teacher network intermediate output feature map PS ₂ is 64×128×128.

In this embodiment, the size of the feature map F _pre is 32×256×256, the size of the feature map F _RDB is 32×256×256, the size of the feature map F _mid is 32×256×256, and the size of the feature map F _s is 32×256×256.

In the present embodiment, the size of the feature map is expressed by the number of channels×length×width, the size of the input feature map F _in is 32×256×256, the size of the output feature map F _out is 32×256×256, and the size of the first scale feature map F _e1 is 32×256×256;

the second input feature map has a size of 64×128×128, the second-scale first coding feature map has a size of 64×128×128, the second-scale second coding feature map has a size of 64×128×128, the first downsampled feature map has a size of 32×128×128, the first post-splice feature map has a size of 96×128×128, and the second-scale feature map F _e2 has a size of 64×128×128;

The third input feature map has a size of 128 x 64, the third scale first encoding feature map has a size of 128 x 64, the third scale second encoding feature map has a size of 128 x 64, the second downsampled feature map has a size of 64 x 64, the third downsampled feature map has a size of 32 x 64, the second post-stitching feature map has a size 224×64×64, and the third scale feature map F _e3 has a size 128×64×64;

The fourth input feature map has a size of 256 x 32, the fourth scale first encoding feature map has a size of 256 x 32, the fourth scale second encoding feature map has a size of 256 x 32, the fourth downsampling feature map has a size of 128 x 32, the fifth downsampled feature map has a size of 64 x 32, the sixth downsampled feature map has a size of 32 x 32, the third post-stitching feature map has a size of 480×32×32, and the fourth scale feature map F _e4 has a size of 256×32×32.

In this embodiment, the first pre-coding feature map and the first coding feature map F _d1 have a size of 256×32×32;

The size of the first decoded first up-sampled feature map is 128×64×64, the size of the first intermediate feature map is 128×64×64, the size of the first decoded second up-sampled feature map is 256×64×64, the size of the first decoded splice feature map is 384×64×64, and the size of the second decoded feature map F _d2 is 128×64×64;

The second decoded first upsampled feature map has a size of 64 x 128, the second intermediate feature map has a size of 64 x 128, the second decoded second upsampled feature map has a size of 256 x 128, the second decoded third upsampled feature map has a size of 128 x 128, the second decoded splice feature map has a size of 448 x 128, and the third decoded feature map F _d3 has a size of 64 x 128;

the size of the third decoded first upsampled feature map is 32×256×256, the size of the third intermediate feature map is 32×256×256, the size of the third decoded second upsampled feature map is 256×256, the size of the third decoded third upsampled feature map is 128×256×256, the size of the third decoded fourth upsampled feature map is 64×256×256, the size of the third decoded splice feature map is 480×256×256, and the size of the fourth decoded feature map F _d4 is 32×256×256;

the size of the output defogging image out is 3×256×256.

In this embodiment, it should be noted that, the first decoding and splicing feature map, the second decoding and splicing feature map, and the third decoding and splicing feature map respectively perform feature processing through a first conv+ InstanceNorm normalization+relu activation function layer and a second conv+ InstanceNorm normalization+relu activation function layer in the feature fusion module, so as to obtain a second decoding feature map, a third decoding feature map, and a fourth decoding feature map.

The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and any simple modification, variation and equivalent structural changes made to the above embodiment according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims

1. A single image defogging method based on multi-teacher knowledge distillation, which is characterized by comprising the following steps:

step one, acquiring a training set image:

step two, establishing a student network model:

step three, extracting features of the foggy training images:

Step four, establishing a total loss function:

Training the student network model by the foggy training image:

Inputting any one foggy image into a trained student network model by adopting a computer to perform defogging treatment to obtain a foggy image;

The PA-based RDB module in step 201 includes a first conv+relu layer, a Conv1 convolution layer, an RDB module, a Conv2 convolution layer, and a Sigmoid activation function layer.

2. A single image defogging method based on multi-teacher knowledge distillation according to claim 1, wherein: in step 201, the number of convolution kernels in the first convolution layer is 32, the size of the convolution kernels is 3×3, the sliding step size is 1, and the padding is 1;

The number of convolution kernels in the first Conv+ReLU layer is 32, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1; the number of convolution kernels in the Conv1 convolution layer is 32, the size of the convolution kernels is 1 multiplied by 1, the sliding step length is 1, and the padding is 0; the number of convolution kernels in the Conv2 convolution layer is 32, the size of the convolution kernels is 3 multiplied by 3, the sliding step length is 1, and the padding is 1;

the number of convolution kernels in the fourth convolution layer is 256, the size of the convolution kernels is 3×3, the sliding step length is 2, and the padding is 1;

3. A single image defogging method based on multi-teacher knowledge distillation according to claim 1, wherein: in step 301, a computer is used to perform feature extraction on the foggy training image I through a first scale network model, so as to obtain a first scale feature map F _e1, which specifically includes the following steps:

4. A single image defogging method based on multi-teacher knowledge distillation according to claim 1, wherein: in step 302, a computer is used to perform feature extraction on the first scale feature map F _e1 through a second scale network model to obtain a second scale feature map F _e2, which specifically includes the following steps:

5. A single image defogging method based on multi-teacher knowledge distillation according to claim 3, characterized in that: in step 3012, the computer inputs the input feature map F _in into a PA-based RDB module to perform feature extraction to obtain an intermediate feature map F _out, which specifically includes the following steps:

step D, the computer is according to An intermediate feature map F _out is obtained.

6. A single image defogging method based on multi-teacher knowledge distillation according to claim 4, wherein: in step 3026, step 3036 and step 3046, the first post-stitching feature map, the second post-stitching feature map and the third post-stitching feature map are recorded as post-stitching feature maps, and the second scale feature map F _e2, the third scale feature map F _e3 and the fourth scale feature map F _e4 are all recorded as post-fusion scale feature maps, and then the post-stitching feature maps are input into a feature fusion module by a computer to obtain a scale feature map, which comprises the following specific steps:

7. A single image defogging method based on multi-teacher knowledge distillation according to claim 1, wherein: in step 305, a computer is used to extract features of the fourth scale feature map F _e4 through the first decoding network model to obtain a first decoding feature map F _d1, which specifically includes the following steps:

step 306, the specific process is as follows:

step 307, the specific process is as follows:

Step 308, the specific process is as follows: