CN113947538A - Multi-scale efficient convolution self-attention single image rain removing method - Google Patents

Multi-scale efficient convolution self-attention single image rain removing method Download PDF

Info

Publication number
CN113947538A
CN113947538A CN202111113807.2A CN202111113807A CN113947538A CN 113947538 A CN113947538 A CN 113947538A CN 202111113807 A CN202111113807 A CN 202111113807A CN 113947538 A CN113947538 A CN 113947538A
Authority
CN
China
Prior art keywords
rain
image
network model
images
trained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111113807.2A
Other languages
Chinese (zh)
Inventor
王鑫
覃琴
李民谣
颜靖柯
王逸轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202111113807.2A priority Critical patent/CN113947538A/en
Publication of CN113947538A publication Critical patent/CN113947538A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • G06T5/73
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30181Earth observation
    • G06T2207/30192Weather; Meteorology

Abstract

The invention discloses a multi-scale efficient convolution self-attention single image rain removing method which comprises the steps of firstly, obtaining corresponding rain images and rain-free images through image data preprocessing, then, transmitting the rain images into a network model fusing an improved Transformer self-attention module and a multi-scale spatial feature fusion module for iterative training, outputting processed images close to the rain-free images through mixed loss function optimization, storing the trained network model, and then, using the trained network model to predict image data to be tested and output images after rain removal.

Description

Multi-scale efficient convolution self-attention single image rain removing method
Technical Field
The invention relates to the technical field of image processing, in particular to a multi-scale efficient convolution self-attention single image rain removing method.
Background
In rainy days, which is a common natural weather condition, the imaging quality of images and video data shot by an outdoor vision system is greatly influenced, and the performance of subsequent high-level computer vision tasks, such as object tracking, target detection, image segmentation and the like, is restricted. Removing rain noise from a rainy image and restoring a clear background is an important image pre-processing problem.
Because the available image characteristic information of a single image is less, the single image has certain challenges in rain removal, and the existing single image rain removal problems are mainly divided into two types: based on a model driving method and a data driving method, the model driving method firstly establishes a physical model for rain stripes based on a certain priori knowledge, such as physical characteristics of rain, then removes rain noise from rain images by manually designing a series of fine mathematical models, and finally obtains clean and rain-free background images, however, the model driving-based rain removing methods are only suitable for specific rain types and cannot cope with irregular distribution of actual rain images, and an optimization algorithm adopted by the method usually involves a plurality of calculation iterations, so that the efficiency of the method is low; the rain removing method based on data driving utilizes the strong feature extraction capability of a deep learning network model, and learns to obtain the features of rain stripes and effective background information through the training of a large number of data sets, so that a rain image is recovered to a rain-free image.
Disclosure of Invention
The invention aims to provide a multi-scale efficient convolution self-attention single image rain removing method, and aims to solve the technical problems that a single image rain removing method in the prior art is large in calculation amount and low in efficiency.
In order to achieve the purpose, the invention adopts a multi-scale high-efficiency convolution self-attention single image rain removing method, which comprises the following steps:
preprocessing data;
constructing a network model;
training the network model;
optimizing a network model;
and predicting and outputting the image after rain removal.
In the data preprocessing process, preprocessing image data to obtain a rain image and a rain-free image, wherein the rain image and the rain-free image are respectively a rain scene and a rain scene in the same environment.
The rain images are used as initial image data for training, and the rain-free images are used as comparison data after processing.
The network model comprises a coding structure and a decoding structure, wherein the coding structure is fused with an improved Transformer self-attention module, the coding structure is embedded with a multi-scale spatial feature fusion module, the decoding structure comprises a conventional high-efficiency rolling block, and semantic features of corresponding scales in the coding structure are fused.
The improved Transformer attention module is added with position coding, so that the improved Transformer attention module not only has the modeling capacity on global features, but also is sensitive to local similar features, the rain noise removal and the maximum reservation of background detail textures are facilitated, and the problem of image part feature loss in the downsampling process in the coding stage can be relieved by embedding the multiscale spatial feature fusion block in the coding stage.
In the process of training the network model, firstly, the optimal parameters of a pre-training model are loaded into the network model, wherein the pre-training model is the trained network model before network improvement, and then the rain images are transmitted into the network model for iterative training.
And in the process of optimizing the network model, updating the network parameters of the network model by adopting a mixed loss function back propagation optimization iteration so that the output result is close to the rainless image, and storing the trained network model.
The raining image is subjected to iterative processing in the network model, and meanwhile, the network model is trained, under the optimization of the mixing loss function, the output processed image is closer and closer to the rainless image, and the network model at the moment is the trained network model and can be used for performing rain removal processing on other images.
And in the process of predicting and outputting the image subjected to rain removal, loading the prepared test image data into a trained network model for forward calculation to obtain the image subjected to rain removal of the test image.
The invention relates to a multi-scale efficient convolution self-attention single image rain removing method which comprises the steps of firstly obtaining corresponding rain images and rain-free images through image data preprocessing, then transmitting the rain images into a network model fusing an improved Transformer self-attention module and a multi-scale spatial feature fusion module for iterative training, outputting processed images close to the rain-free images through mixed loss function optimization, storing the trained network model, and then using the trained network model to predict image data to be tested and output images after rain removal.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a multi-scale efficient convolution self-attention single image rain removal method according to the invention.
FIG. 2 is a network model structure diagram of a multi-scale efficient convolution self-attention single image rain removal method according to the invention.
FIG. 3 is a block diagram of the multi-scale spatial feature fusion module of the present invention.
Fig. 4 is a comparison of subjective experimental results on a synthetic data set Rain100H for different algorithms in an embodiment of the invention.
FIG. 5 is a comparison graph of the average run time and evaluation metrics over different algorithms Rain100H in an embodiment of the present invention.
Fig. 6 is a graph of the results of subjective experiments on a simulation data set SPA with different algorithms in an embodiment of the invention.
FIG. 7 is a structural comparison diagram of two combination schemes of the cross-scale convolution self-attention module of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Referring to fig. 1, the present invention provides a method for removing rain from a single image with self attention by using multi-scale high-efficiency convolution, comprising the following steps:
s1: preprocessing data;
s2: constructing a network model;
s3: training the network model;
s4: optimizing a network model;
s5: and predicting and outputting the image after rain removal.
In the data preprocessing process, preprocessing image data to obtain a rain image and a rain-free image, wherein the rain image and the rain-free image are respectively a rain scene and a rain scene in the same environment.
The network model comprises a coding structure and a decoding structure, wherein the coding structure fuses an improved Transformer self-attention module, the coding structure is embedded with a multi-scale spatial feature fusion module, and besides a conventional Efficient Convolution Block (ECB), the decoding structure fuses semantic features of corresponding scales in the coding structure through Skip Connection operation so as to guide an up-sampling process in a decoding stage, establish long-distance feature dependence and contribute to restoration of image details.
As shown in FIG. 2, the network body is composed of an encoding-decoding structure, and further integrates a cross-scale convolution self-attention module and a multi-scale spatial feature fusion module which are improved on the basis of the original Transformer.
Improved transducer self-attention module: in the 3 rd and 4 th stages of the down-sampling stage, an improved Transformer self-attention module is fused, and the calculation formula is as follows:
Figure BDA0003274759610000041
wherein Q, K, V represents the input vector X ∈ RN×CIs subjected to linear transformation WQ、WKAnd WV∈RC×CAre mapped into corresponding query vectors Q, key vectors K and value vectors V,
Figure BDA0003274759610000042
for the scaling factor, softmax is the activation function,
Figure BDA0003274759610000043
representing an attention map, computed using two-dimensional depth convolution in a visual task
Figure BDA0003274759610000044
Figure BDA0003274759610000045
Where ° denotes the direct multiplication of the elements corresponding to the matrix positions. The improved transform attention module has rain added with position coding, has the capability of modeling global features, is sensitive to local similar features, and is beneficial to clearing rain noise and retaining background detail textures to the maximum extent.
A multi-scale spatial feature fusion module: in an image rain removing algorithm, a plurality of downsampling operations are usually performed on an image to discard redundant information in the image, but part of effective information is lost at the same time, so that the position of a rain strip cannot be accurately positioned, and the problems of incomplete clearing of the rain strip, incomplete image background structure and the like can occur. In order to solve the problem that partial image features are lost in the down-sampling process in the encoding stage, the invention designs a multi-scale spatial feature fusion module which is embedded in the last stage of the encoding stage and aggregates context information from multiple scales so as to fully learn the rain stripe features with different sizes, so that a network model can cope with various complex rainfall situations in a real environment. The specific structure of the multi-scale spatial feature fusion module is shown in fig. 3, 5 parallel convolution operations are adopted to process input features, firstly, a 1 × 1 convolution is utilized to reduce the dimension of an input feature map, then, different expansion factors 2, 4 and 8 are respectively set for three 3 × 3 convolutions, feature extraction is carried out on images with three different receptive fields, the perception capability of a model on different-size rain stripes is improved, secondly, an adaptive average pooling operation is utilized to reduce information redundancy, finally, a 1 × 1 convolution is utilized to reduce the number of channels, 5 feature maps with different scales are fused together, and effective information with different scales in the images can be fully learned.
And in the process of training the network model, transmitting the rain images into the network model for iterative training.
In the process of optimizing the network model, the network parameters of the network model are updated iteratively by adopting a mixed loss function back propagation optimization, so that the output result is close to the rainless image, and the trained network model is stored.
And (3) giving a training loss objective function, wherein the loss function is formed by mixing an MAE loss function, an MS-SSIM loss function, an MSE loss function and a TV loss function, the defect shown by a single loss function is made up by using the advantages of all loss functions, and the stability of the network is enhanced. Firstly, mixing an MAE loss function and an MS-SSIM loss function according to a certain weight, wherein the formula is as follows:
Figure BDA0003274759610000051
wherein L isMS-SSIMFor loss of MS-SSIM, LMAEFor the MAE loss function, it is calculated as follows:
LMS-SSIM(P)=1-(MS-SSIM(p)) (4)
Figure BDA0003274759610000052
where P represents a block of pixel region and P represents a pixel point in the pixel region P, and a is empirically set to 0.84.
Then, the MS-SSIM loss function is not particularly sensitive to consistency deviation, and is easy to cause the brightness change and the color deviation of the image; the TV loss function constrains the smoothness of the image by calculating the difference between adjacent pixels, so that the output image is relatively smooth, which can be used to solve the artifact problem caused by the rainstreak residue in the image after rain removal, but is not suitable for single use. In a rainy image, rain stripes and background detail textures mostly have high-frequency regions, and the MAE loss function gives relatively large weight to the high-frequency parts of the image, which results in that the rain stripe parts are remained while details are retained, so that the MSE loss function and the TV loss function are used for removing rain stripe artifacts. Finally, the mixing loss function is as follows:
LMix=LMS-SSIM-MAE+μ·LMSE+λ·LTV (6)
wherein μ and λ are penalty factors, and are adjusted step by step according to experiment to obtain values of 0.3 and 2 × 10-8. The expressions for the MSE penalty function and TV penalty function are as follows:
Figure BDA0003274759610000061
LTV(p)=∑i,j((pi,j+1-pi,j)2+(pi+1,j-pi,j)2)β/2 (8)
and iterating the network parameters of the network model through back propagation optimization of the obtained mixed loss function to enable the output result of the network to gradually approach the rain-free image, and storing the trained model.
And in the process of predicting and outputting the image subjected to rain removal, loading the prepared test image data and the pre-training model into the trained network model for forward calculation to obtain the image subjected to rain removal of the test image.
Further, the present invention provides a specific example of experimental comparison using a synthetic data set:
in order to comprehensively verify the performance of the technical scheme, the invention is compared with several advanced rain removing methods based on deep learning at present, and the method specifically comprises the following steps: MPRNET (2021), RCDNet (2020), JORDER-E (2020), DCSFN (2020), SPANet (2019), RESCAN (2018).
Referring to fig. 4, fig. 4 shows the subjective experimental results of different algorithms on the synthetic data set Rain100H, and it can be seen from the figure that the present invention can effectively remove the rainstripes with different directions and different densities, and generate a near-real Rain-free image, while retaining most details. In contrast, the images generated by other methods are smoother and even destroy the background content, as can be seen in fig. 4(a), the female face has traces similar to smeared blocks, fig. 4(b) the girl hair texture is almost disappeared, and fig. 4(c) the cross building is blurred, and in contrast, only the MPRNet method retains some details. The DCSFN algorithm uses SSIM loss as a loss function, which makes the boundary of the structure obvious and also causes the rain streak to remain, for example, the sky of the DCSFN output image in fig. 4(c) can obviously see the rain streak artifact; the RCDNet algorithm uses a MSE loss function, resulting in the algorithm penalizing smooth regions in the image more heavily, blurring the image. The image output by the algorithm of the invention not only completely removes the rain stripes, but also has natural female face, clear girl hair texture and complete cross, fully retains the background details, effectively combines the local feature modeling capability of convolution and the global feature modeling capability of self attention, and simultaneously verifies the feasibility of the mixed loss function in the scheme.
In addition to comparing the subjective effect of each algorithm, in order to embody the performance improvement brought by the algorithm provided by the invention on data, the invention adopts two image quality evaluation indexes of structure similarity SSIM and peak signal-to-noise ratio PSNR to objectively evaluate each algorithm. Wherein, the closer the SSIM value is to 1, the higher the similarity of the two images is; the larger the PSNR value, the less distortion of the image is indicated. Table 1 gives the SSIM and PSNR values for each algorithm on different data sets. As can be seen from table 1, the algorithm of the present invention is quite competitive with some advanced algorithms. From the image quality evaluation index alone, although on the Rain100L data set, the method is slightly inferior to RCDNet in PSNR index and is slightly lower than JORDER-E in SSIM index, on the Rain100H data set, the method of the invention has a leading position, compared with the latest MPRNT algorithm, the SSIM index is improved by 0.0153, the PSNR index is improved by 0.95dB, the probable reason is that a larger training set is needed due to weak inductive bias in a Transformer, and the Transformer is suitable for a large-scale data set, so the method achieves more advantages on the larger Rain100H data set.
Figure BDA0003274759610000071
TABLE 1 comparison of evaluation indices of different algorithms on a composite dataset
In addition, in order to embody the Rain removal efficiency of the method, the invention also compares the running time of averagely processing a Rain image by different algorithms on the data set Rain100H and the evaluation indexes PSNR and SSIM. The comparison result is shown in fig. 5, and it can be seen that the algorithm of the present invention is equivalent to the RCDNet in terms of PSNR index, but the processing speed is about 50 times faster than the PSNR index; compared with the latest algorithm MPRNT, the algorithm has higher evaluation index and greatly advanced rain removal speed, and the method has the advantages that the efficient rolling blocks are used, and the reasoning speed of the model is increased.
Further, in order to verify the generalization ability of the algorithm provided herein, the present invention also compares the algorithm with the latest algorithm MPRNet (2021) on a near-true rainy data set, and the comparison result is shown in fig. 6, where the two algorithms have equivalent rain removing effect on a near-true rainy image, but the method retains more details, such as fig. 6(a), where the MPRNet algorithm removes one white long object from the background of the original image, and the algorithm retains the whole rain removing image. Experiments prove that: the algorithm has strong generalization capability, can effectively remove different degrees of rainstripes on a synthesized rainy image, and has good clearing effect on a raining scene image close to reality.
Further alternatively, for the way of fusing the cross-scale convolution self-attention module at the encoding stage, two combination schemes are designed as shown in fig. 7:
in order to compare the influence of the number of the cross-scale convolution self-attention module combined with the common convolution on the performance of the proposed network model and the effectiveness of the module in feature extraction in images with different resolutions, two different combination schemes are trained on a data set Rain100H, and evaluation indexes SSIM and PSNR of the two schemes on the two data sets are shown. As shown in table 2, the evaluation index values obtained after training on the Rain100H data set using the two schemes of combination a and combination B are equivalent, combination B is slightly advanced, but combination B contains fewer transform modules, which means fewer parameters and computational consumption. Thus, the scheme herein employing combination B combines volume blocks and transformers.
Figure BDA0003274759610000081
TABLE 2 comparison of evaluation indices for combination A and combination B on Rain100H dataset
Here, a hybrid loss function is used as an objective function for the desired optimization:
LMix=LMS-SSIM-MAE+μ·LMSE+λ·LTV (9)
while the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A multi-scale efficient convolution self-attention single image rain removing method is characterized by comprising the following steps:
preprocessing data;
constructing a network model;
training the network model;
optimizing a network model;
and predicting and outputting the image after rain removal.
2. The method for rain removal from a single image with multi-scale high-efficiency convolution according to claim 1, wherein in the data preprocessing process, the image data is preprocessed to obtain a rain image and a rain-free image, and the rain image and the rain-free image are respectively a rain scene and a rain scene in the same environment.
3. The method as claimed in claim 2, wherein the network model includes a coding structure and a decoding structure, the coding structure incorporates an improved Transformer self-attention module, the coding structure further embeds a multi-scale spatial feature fusion module, and the decoding structure includes a conventional high-efficiency convolution block and incorporates semantic features of corresponding scales in the coding structure.
4. The method as claimed in claim 3, wherein in the training of the network model, parameters of a pre-trained model are loaded into the network model, wherein the pre-trained model is a trained network model before network improvement, and then the rain image is transmitted into the network model for iterative training.
5. The method as claimed in claim 4, wherein in the process of network model optimization, the network parameters of the network model are iteratively updated by back propagation optimization with a mixed loss function, so that the output result is close to the rainless image, and the trained network model is saved.
6. The method as claimed in claim 5, wherein in the process of predicting and outputting the image after rain removal, the prepared test image data is loaded into the trained network model for forward calculation, and then the relatively better weight and bias parameters obtained by updating the back propagation process are calculated with the input to obtain the image after rain removal of the test image.
CN202111113807.2A 2021-09-23 2021-09-23 Multi-scale efficient convolution self-attention single image rain removing method Pending CN113947538A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111113807.2A CN113947538A (en) 2021-09-23 2021-09-23 Multi-scale efficient convolution self-attention single image rain removing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111113807.2A CN113947538A (en) 2021-09-23 2021-09-23 Multi-scale efficient convolution self-attention single image rain removing method

Publications (1)

Publication Number Publication Date
CN113947538A true CN113947538A (en) 2022-01-18

Family

ID=79329015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111113807.2A Pending CN113947538A (en) 2021-09-23 2021-09-23 Multi-scale efficient convolution self-attention single image rain removing method

Country Status (1)

Country Link
CN (1) CN113947538A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115112669A (en) * 2022-07-05 2022-09-27 重庆大学 Pavement nondestructive testing identification method based on small sample
CN116824372A (en) * 2023-06-21 2023-09-29 中国水利水电科学研究院 Urban rainfall prediction method based on Transformer

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115112669A (en) * 2022-07-05 2022-09-27 重庆大学 Pavement nondestructive testing identification method based on small sample
US11908124B2 (en) 2022-07-05 2024-02-20 Chongqing University Pavement nondestructive detection and identification method based on small samples
CN116824372A (en) * 2023-06-21 2023-09-29 中国水利水电科学研究院 Urban rainfall prediction method based on Transformer
CN116824372B (en) * 2023-06-21 2023-12-08 中国水利水电科学研究院 Urban rainfall prediction method based on Transformer

Similar Documents

Publication Publication Date Title
CN108986050B (en) Image and video enhancement method based on multi-branch convolutional neural network
CN109087258B (en) Deep learning-based image rain removing method and device
CN109087273B (en) Image restoration method, storage medium and system based on enhanced neural network
CN110599401A (en) Remote sensing image super-resolution reconstruction method, processing device and readable storage medium
CN110503613B (en) Single image-oriented rain removing method based on cascade cavity convolution neural network
CN113658051A (en) Image defogging method and system based on cyclic generation countermeasure network
CN113947538A (en) Multi-scale efficient convolution self-attention single image rain removing method
CN111161360A (en) Retinex theory-based image defogging method for end-to-end network
CN115345866B (en) Building extraction method in remote sensing image, electronic equipment and storage medium
CN111553851A (en) Video rain removing method based on time domain rain line decomposition and spatial structure guidance
CN116912257B (en) Concrete pavement crack identification method based on deep learning and storage medium
CN113506224A (en) Image restoration method based on multi-scale generation countermeasure network
CN115063318A (en) Adaptive frequency-resolved low-illumination image enhancement method and related equipment
Zheng et al. T-net: Deep stacked scale-iteration network for image dehazing
Lin et al. Single image deraining via detail-guided efficient channel attention network
CN114155171A (en) Image restoration method and system based on intensive multi-scale fusion
CN113256519A (en) Image restoration method, apparatus, storage medium, and program product
Singh et al. Weakly supervised image dehazing using generative adversarial networks
CN115631223A (en) Multi-view stereo reconstruction method based on self-adaptive learning and aggregation
Rani et al. ELM-Based Shape Adaptive DCT Compression technique for underwater image compression
Zhu et al. HDRD-Net: High-resolution detail-recovering image deraining network
CN114549302A (en) Image super-resolution reconstruction method and system
CN113763268A (en) Blind restoration method and system for face image
CN115705493A (en) Image defogging modeling method based on multi-feature attention neural network
Wu et al. Semantic image inpainting based on generative adversarial networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination