CN115187482A

CN115187482A - Image rain removing algorithm based on multi-scale attention distillation learning network

Info

Publication number: CN115187482A
Application number: CN202210807681.7A
Authority: CN
Inventors: 金桂月; 李朋鹏; 金基宇; 李宏涛; 赵向彬; 刘子昂; 姚天昊
Original assignee: Dalian Polytechnic University
Current assignee: Dalian Polytechnic University
Priority date: 2022-07-09
Filing date: 2022-07-09
Publication date: 2022-10-14

Abstract

The invention discloses an image rain removal algorithm based on a multi-scale attention distillation learning network. The invention provides a multi-scale attention distillation learning network to explore the relevance between the image scale and the space, and a new dense connection distillation structure is adopted to effectively learn and represent richer features, simultaneously, the problem of gradient disappearance is relieved, the feature propagation is enhanced, and the model performance is improved. The present invention proposes a multi-scale attention Module (MAB) and a parallel attention distillation module (CADB). The MAB uses dilation convolution of different sizes to extract features of different scales. The CADB combines channel attention and subspace attention mechanisms to recalculate the raindrop feature map to reduce useless features and preserve spatial and background information.

Description

Image rain removing algorithm based on multi-scale attention distillation learning network

Technical Field

The invention relates to the field of electronics, in particular to an image rain removal algorithm based on a multi-scale attention distillation learning network.

Background

Rain can cause severe image blurring and visual quality degradation. Particularly, the visual degradation of the similar veil formed by superposing rain fog, rain stripes and rain particles in the air on the background greatly reduces the contrast and visibility of the scene. Therefore, image restoration in rain weather has become a necessary preprocessing step for subsequent tasks, such as object tracking, scene analysis, road condition detection for automatic driving, and the like. In recent years, the rain removal of a single image has attracted a great deal of attention in the fields of computer vision and pattern recognition, and in many practical application scenarios, the restoration of a clean image in rainy weather is urgently needed.

In recent years, more and more researchers are concerned about the algorithm design for removing rain from a single image, and the proposed method has gradually transited from model-driven to data-driven. Model-based driving methods include filter-based methods and a priori-based methods: among the filter-based methods are restoration of a clean image using physical filtering, such as: the rainy day image is divided into two parts by utilizing bilateral filtering: low-frequency components and high-frequency components, and then combining the low-frequency components with the non-rainy components to obtain a restoration result of the rainy image; the method based on prior regards the restoration of the rain image as an optimization problem; with a low rank representation, the rainstripes are then separated and removed from the original image by a mixed feature set. However, the model-driven method can only filter out noise (such as gaussian noise) that obeys a specific distribution, and the obtained image has limited sharpness. Therefore, the proposed physical models have certain limitations, cannot well express the main characteristics of the image in rainy days, and have limited restoration effect. In contrast to model-driven based methods, data-driven methods treat the image rain removal problem as a process of learning a non-linear function. In conjunction with rain detection and removal networks, focus is on removing rain stripes that overlap in heavy rain, but it loses some of the texture details. To avoid losing texture detail, a depth detail network is further proposed, however this method cannot handle too many dense rainstripes. With the popularity of convolutional neural networks. More and more convolutional neural network based methods are proposed: in order to alleviate the problem that the deep network structure is difficult to reproduce, a simple and effective progressive recursive rain removal network is provided. And the lightweight pyramid network consists of fewer parameters, so that the network is simple. However, most existing single image rain removal networks do not notice the internal connection of rain stripes at different scales well. The squeeze-and-excite network adopts expansion convolution to obtain background information and uses recurrent neural network to reshape rain characteristics. Recent researchers have proposed a multi-stream expanded residual dense network that more accurately obtains rain streak features through multi-scale extraction and residual dense connectivity. Research methods based on generation of countermeasure networks, which are introduced to capture some characteristics of inclement weather that cannot be modeled and synthesized, are launched late compared to those based on convolutional neural networks, to reduce the gap between the generated results and the true clean image. The conditional generation type confrontation network is directly applied to the rain removing task of a single image, and better illumination, color and contrast distribution is presented as a result. However, the method based on generating the countermeasure network is not good at capturing detailed information of an image, and thus has a poor effect on an image with diversified rainstripes. In order to further improve the recovery performance of real rain images, semi-supervised and unsupervised learning methods have been proposed recently, which learn features directly from real rain data, thereby improving the generalization and scalability of the method.

Through the analysis, the existing single image rain removing method does not fully utilize the image feature correlation on the scale-space, so that the problems of poor detail texture and information recovery effect and the like exist.

Disclosure of Invention

In order to overcome the defects, the invention adopts the following technical scheme:

the invention provides an end-to-end multi-scale attention distillation learning network to solve the problem of rain removal of a single image, and the network mainly comprises a multi-scale attention Module (MAB) and a parallel attention distillation module (CADB) which are respectively used for feature extraction and distillation extraction. In the feature extraction stage, a convolution layer with a convolution kernel size of 3*3 is first applied to extract shallow original image features, and the result is also used as an input of an MAB. Then, the original image features are further multi-scale feature extraction sequentially through 8 MABs through dense connection, so that deep image features are obtained. In the distillation refining stage, the output of the first 7 MABs is subjected to characteristic splicing, the number of channels is adjusted through the convolution of 1*1 and then is used as the input of the CADB, and the CADB is used for calibrating and refining the input characteristic information and filtering out useless characteristic information.

The invention comprises the following steps: establishing a multi-scale attention distillation learning network for removing rain from a single image, wherein the whole network architecture mainly comprises a multi-scale attention module and a parallel attention distillation module which are respectively used for feature extraction and knowledge distillation;

parallel attention distillation module process: set the received input to L ₀ Parallel attention distillation Module input feature map (L) ₀ ) Is divided into n mutually exclusive groups [ L ₁ ,L ₂ ,...L _η ,...L _n ]Is prepared by mixing L _η Defined as a set of intermediate feature maps, the overall architecture can be formulated as:

A _η ＝softmax(Conv _1×1 (Conv _3×3 (DWConv _1×1 (L _η )+maxpooling _3×3 (L _η )))),(1)

SE(L ₀ )＝F _scale (L ₀ ,Sigmoid(FC ^(c/r,c) (relu(FC ^(c,c/r) (globalpooling _3×3 (L ₀ )))))),(3)

wherein in formula 1, maxpololing _3×3 For maximum pooling operation with kernel size of 3 × 3, DWConv _1×1 Is a deep separable convolution with a kernel size of 1 × 1, and Conv _3×3 And Conv _1×1 Then the normal convolutions, a, of convolution kernel sizes 3 x 3 and 1 x 1, respectively _η Is composed of a set of intermediate characteristic graphs L _η Inferred attention feature map, attention map A in each set of subspaces _η Collecting cross-channel information through learning, capturing nonlinear dependence relation between feature graphs, and ensuring A _η For an effective attention weighted tensor, we use a gating mechanism with softmax activation, and each set of feature maps will get a redefined feature map set after passing through formula (2)

Wherein

For elemental multiplication, equation (3) is a representation of the channel attention branch, globalpooling _3×3 Is an average pooling of kernel sizes 3 × 3, FC ^(c,c/r) ，FC ^(c/r,c) Respectively compressing the fully connected layers excited and restored on the channels by the characteristic graphs, wherein the compression coefficient is r, the final output of the CADB is jointly obtained by the formulas (1) to (3) as the formula (4), concat is obtained by splicing and combining the characteristic graphs of each group again,

the CADB is embedded into each MAB, and meanwhile, the output of each MAB is used as the input of the CADB to solve the information loss in the multi-scale feature acquisition, transmission and fusion processes, the core of the CADB is to recalibrate feature information by utilizing a parallel channel and subspace attention mechanism to realize feature extraction, and the CADB is very helpful for mining scale-space feature information;

multi-scale attention module processing: the input feature image of MAB is set as F _in First, the convolution kernels are sized 1 × 1, 3 × 3 and 5 × 5 by convolution layer, and outputThe representation is as follows:

wherein

A multi-scale convolution first layer output representing convolution sizes of n × n, respectively; conv _n×n (. -) represents a convolution operation;

representing the superparameters formed by the first layer of the multi-scale convolution with convolution kernels of n × n, respectively, the output information can be used for further extracting image characteristics through multi-scale convolution with convolution kernels of 1 × 1, 3 × 3 and 5 × 5,

wherein

Expressing the second layer output of the multi-scale convolution with the convolution size of n multiplied by n respectively; conv _n×n (. -) represents a convolution operation;

the hyperparameters formed by the second layer of the multi-scale convolution with convolution kernels of size n × n, respectively, can similarly be represented as follows as the output of the third layer of the multi-scale convolution:

after the MAB passes through the multi-scale convolution layer, the intra-layer information fusion is realized through convolution with convolution kernel sizes of 1 × 1 and 3 × 3, finally, a CADB is introduced to improve the representation capability of useful characteristic information of the image, and the final output of the MAB is expressed as follows:

wherein F _out Representing the output of MAB, CADB (-) representing the parallel attention distillation module, and { η } ₁ ；η ₂ ；η ₃ ；η ₄ Denotes the hyper-parameter of the MAB output.

The invention has the beneficial effects that: the invention effectively learns the image characteristics on a richer scale-space, thereby obtaining a good rain removing effect and better retaining the original texture details of the image.

The invention provides a multi-scale attention distillation learning network to explore the relevance between the image scale and the space, and a new dense connection distillation structure is adopted to effectively learn and represent richer features, simultaneously, the problem of gradient disappearance is relieved, the feature propagation is enhanced, and the model performance is improved.

The present invention proposes a multiscale attention Module (MAB) and a parallel attention distillation module (CADB). The MAB uses dilation convolution of different sizes to extract features of different scales. The CADB combines channel attention and subspace attention mechanisms to recalculate the raindrop feature map to reduce useless features and preserve spatial and background information.

The present invention has been experimented with synthetic and real-world rain datasets (4 synthetic datasets and 2 real-world datasets). Ablation studies were conducted to verify the rationality and necessity of key modules in our network.

Drawings

FIG. 1 is a diagram of a multi-scale attention-deficit-learning network architecture.

Fig. 2 is a parallel attention distillation module.

Fig. 3 is a multi-scale attention module.

Detailed Description

The invention provides an image rain removal algorithm based on a multi-scale attention distillation learning network, which is shown in figure 1. The whole network architecture mainly comprises a multi-scale attention module and a parallel attention distillation module, and the multi-scale attention module and the parallel attention distillation module are respectively used for feature extraction and knowledge distillation. The parallel attention distillation module, the multi-scale attention module, will be described in detail below.

1. Parallel attention distillation module

The present design proposes a multi-scale attention distillation learning network for the rain removal of a single image, as shown in fig. 1. The overall network architecture mainly consists of a multi-scale attention Module (MAB) and a parallel attention distillation module (CADB) which are respectively used for feature extraction and knowledge distillation. The parallel attention distillation module, the multi-scale attention module, proposed herein will be specifically described below.

The key to further solving the problem of removing rain from a single image is how to better acquire and characterize rain streak features for removal. Although a deeper network is beneficial to extracting the features of the rainstripes, the capability of describing the image features with the increase of the network depth is gradually weakened along with the transmission process, and a large amount of redundant feature information is generated.

How to solve these problems will directly affect the quality of the restored image. A parallel architecture is achieved by combining the channel attention module and the subspace attention module to eliminate a large number of redundant features and extract more useful image features.

The parallel channel and subspace attention mechanism is focused on acquiring spatial and channel feature information and only allowing the features containing useful information to be further transmitted, and the structure is shown in FIG. 2, and the input of the received feature is set to be L ₀ CADB characteristic graph (L) to be input ₀ ) Is divided into n mutually exclusive groups [ L ₁ ,L ₂ ,...L _η ,...L _n ]Is prepared by mixing L _η Defined as a set of intermediate feature maps, the overall architecture can be formulated as:

SE(L ₀ )＝F _scale (L ₀ ,Sigmoid(FC ^(c/r,c) (relu(FC ^(c,c/r) (globalpooling _3×3 (L ₀ )))))) (3)

wherein in the formula (1), maxpololing _3×3 The maximum pooling operation with a kernel size of 3 x 3. DWConv _1×1 Is a deep separable convolution with a kernel size of 1 × 1, and Conv _3×3 And Conv _1×1 Then the normal convolutions of convolution kernel size 3 x 3 and 1 x 1, respectively. A. The _η Is composed of a set of intermediate characteristic graphs L _η Inferred attention profiles, attention profiles in each group (subspace) A _η Through learning and collecting cross-channel information, capturing nonlinear dependence relation between feature graphs. At the same time in order to ensure A _η For an effective attention weighted tensor we use a gating mechanism with softmax activation. Each group of feature maps is processed by a formula (2) to obtain a redefined feature map set

Wherein

Is element multiplication. Equation (3) is a representation of the channel attention branch, globalporoling _3×3 Is the average pooling of kernel sizes 3 × 3. FC ^(c,c/r) ，FC ^(c/r,c) The fully connected layers for the compressed excitation and recovery of the characteristic diagram on the channel are respectively, and the compression coefficient is r. The final output of the CADB is jointly derived from equations (1) - (3) as equation (4), and Concat is the re-splicing and combining of the feature maps of the groups.

The CADB is embedded into each MAB, with the output of each MAB being used as input to the CADB to account for information loss during multi-scale feature acquisition, transmission, and fusion. The core of the CADB is to recalibrate the characteristic information by utilizing a parallel channel and subspace attention mechanism to realize characteristic refinement, and the CADB is very helpful for mining the scale-space characteristic information.

2. Multi-scale attention module

The multi-scale feature acquisition method effectively combines image features under different scales, and is widely applied to extracting useful information of a target and the surrounding environment of the target at present. In order to further improve the acquisition and characterization capability of the network on the rain stripe features, the multi-scale attention module adopts intra-layer multi-scale information fusion, and information fusion among the features with different scales is realized. The structure also ensures that the input information can be spread through all parameter layers, so that the characteristic information of the original image can be better learned. Under the guidance of the above idea, a multi-scale attention Module (MAB) is proposed and used to more fully and effectively learn feature information of different scale spaces in a rainy image, as shown in fig. 3.

Available number of multiscale attention Modules (MABs)The mathematical formula is described in detail. Referring to FIG. 3, the input feature image of the multi-scale attention module is set to F _in First, with the convolutional layer, the convolutional kernel sizes are 1 × 1, 3 × 3, and 5 × 5, and the output is expressed as follows:

wherein

Representing the first layer of output of n × n multi-scale convolution with respective convolution sizes; conv _n×n (. -) represents a convolution operation;

representing the hyperparameters formed by the first layer of the multi-scale convolution with convolution kernels of n x n, respectively. The output information may be further extracted by multi-scale convolution with convolution kernel sizes of 1 × 1, 3 × 3, and 5 × 5.

Wherein

Expressing the second layer output of the multi-scale convolution with the convolution size of n multiplied by n respectively; conv _n×n (. Cndot.) represents a convolution operation;

representing the hyperparameters formed by the second layer of the multi-scale convolution with convolution kernels of n x n, respectively. Similarly, we can represent the output of the third layer of the multi-scale convolution as follows:

from fig. 3, it can be found that after passing through the multi-scale convolution layer, the MAB implements intra-layer information fusion through convolution with convolution kernel sizes of 1 × 1 and 3 × 3, and finally introduces a CADB to improve the characterization capability of useful feature information of the image, and the final output of the MAB is represented as follows:

wherein F _out Representing the output of MAB, CADB (-) representing the parallel attention distillation module, and { η } ₁ ；η ₂ ；η ₃ ；η ₄ Denotes the hyperparameter of the MAB output.

3. Loss function

To train our designed network, a mixture loss function is used, including the Structural Similarity Index (SSIM) loss and L ₁ And (4) loss. In particular, SSIM loss is used to assess structural similarity, which may be more pronouncedHigh frequency structural information is well preserved. And L is ₁ The loss provides an efficient way to constrain the differences between color and luminance characteristics.

These two loss functions can be expressed as:

L _s ＝1-SSIM(R-GT)， (16)

wherein L is ₁ The loss and SSIM loss are defined as L, respectively ₁ And L _s R is a rainy image and GT represents a true no-rain image. By weighting the sum of the SSIM loss and the loss, our final hybrid loss function can be expressed as:

L _total ＝L ₁ +λL _s ， (17)

where λ is a weighting parameter, empirically set to 0.2.

The invention provides a multi-scale attention distillation learning network to solve the problem of rain removal of a single image, and dense connection is adopted to realize feature reuse and full propagation. In order to better acquire and characterize feature information of the rainstripes, a multi-scale attention module is introduced to extract local and global features. In addition, the design employs a parallel attention distillation module to recalibrate intralayer and interlayer image features by using channel attention and subspace attention mechanisms to reduce unwanted features and preserve spatial and background information. Quantitative and visual results on synthetic and real data sets show that the method provided by the invention is superior to the main algorithm of comparison. The invention further aims at generalization performance upgrading in future work: and searching inter-domain self-adaptation, aiming at the characteristics of complex noise and many degradation factors of a real rain scene, utilizing a multi-source synthetic data set, and realizing domain migration and weight distribution of synthetic data by adopting the inter-domain self-adaptation, so as to better simulate real rain image information and further improve the robustness and generalization capability of a rain removing algorithm.

Claims

1. An image rain removing algorithm based on a multi-scale attention distillation learning network is characterized in that: the method comprises the following steps: establishing a multi-scale attention distillation learning network for removing rain from a single image, wherein the whole network architecture mainly comprises a multi-scale attention module and a parallel attention distillation module which are respectively used for feature extraction and knowledge distillation;

A _η ＝softmax(Conv _1×1 (Conv _3×3 (DWConv _1×1 (L _η )+maxpooling _3×3 (L _η )))) (1)

wherein in formula 1, maxpololing _3×3 For maximum pooling operation with kernel size of 3 × 3, DWConv _1×1 Is a deep separable convolution with a kernel size of 1 × 1, and Conv _3×3 And Conv _1×1 Then the normal convolutions, a, of convolution kernel sizes 3 x 3 and 1 x 1, respectively _η Is composed of a set of intermediate characteristic graphs L _η Inferred attention feature map, attention map A in each set of subspaces _η Collecting cross-channel information through learning, capturing nonlinear dependence relation between characteristic graphs and ensuring A _η For an effective attention weighted tensor we adoptA gating mechanism with softmax activation is used, and each group of feature maps obtains a redefined feature map set after passing through formula (2)

Wherein

the CADB is embedded into each MAB, meanwhile, the output of each MAB is used as the input of the CADB to solve the information loss in the multi-scale feature acquisition, transmission and fusion processes, the core of the CADB is to recalibrate feature information by using a parallel channel and a subspace attention mechanism to realize feature extraction, and the CADB is very helpful for mining scale-space feature information;

multi-scale attention module process: the input feature image of MAB is set as F _in First, with the convolutional layer, the convolutional kernel sizes are 1 × 1, 3 × 3, and 5 × 5, and the output is expressed as follows:

wherein

representing the hyperparameter formed by the first layer of the multi-scale convolution with convolution kernels of n × n respectively, the output information can be used for further extracting the image characteristics through the multi-scale convolution with convolution kernels of 1 × 1, 3 × 3 and 5 × 5,

wherein

Expressing the second layer output of the multi-scale convolution with the convolution size of n multiplied by n respectively; conv _n×n () Representing a convolution operation;