CN112348762A

CN112348762A - Single image rain removing method for generating confrontation network based on multi-scale fusion

Info

Publication number: CN112348762A
Application number: CN202011385947.0A
Authority: CN
Inventors: 冯佳佳; 徐志京; 于帅
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2021-02-09

Abstract

The invention provides a single image rain removing method for generating a confrontation network based on multi-scale fusion. Firstly, detecting a rained image through image saliency to obtain a saliency map; further fusing the saliency map and the raininess image Concat to accurately identify a raindrop area to be repaired; then, a rain-removing network is generated by utilizing multi-scale fusion, multi-scale fusion is carried out on l12, l14 and l16 layer networks in a generator network, the quality of the generated rain-removing image is improved, and a final network model is obtained by training through the combination of global identification and local identification of a discriminator network; and inputting the test set into a trained model to obtain an image after rain removal, and evaluating the generated image according to SSIM and PSNR indexes. The image visual effect generated by the image rain removing method provided by the invention is better, the removed raindrop area has more authenticity and continuity, and each evaluation index is improved.

Description

Single image rain removing method for generating confrontation network based on multi-scale fusion

Technical Field

The invention relates to the technical field of image restoration, in particular to a single image rain removing method for generating an anti-network based on multi-scale fusion.

Background

Computer vision is a key technology which can be realized by functions such as automatic driving, video monitoring and the like, and the effect of the computer vision depends on the quality of images. Under the rainy condition, shot images and videos are easily scattered and blurred by raindrops, so that a large amount of information is lost due to image blurring, the visual effect of image shooting is seriously reduced due to reduced visibility, and the outdoor visual effect is greatly influenced.

At the present stage, a plurality of solutions for removing rain are developed in connection with the restoration of the rain image, and the rain removing effect is also continuously improved. With the rapid development of machine learning, image rain removing algorithms based on deep learning are more and more. At present, a single image rain removing method based on a convolutional neural network is proposed in the industry, and the method has poor effect on large and dense rain images, and meanwhile, the resolution of output images is low and the visual effect is poor. In addition, people remove rain from a rain image through a Pix2Pix network, and the experimental result shows that the network still has some unnatural traces on the removal of a rain area. When different methods designed for single image rain removal research are used for repairing a rain area, a local area is still blurred, and more detail information is lost, so that the problem that technicians in the field need to research is still to provide a single image rain removal method based on deep learning with a better effect for repairing the rain image.

Disclosure of Invention

The invention aims to provide a single image rain removing method for generating an antagonistic network based on multi-scale fusion. And then fusing the saliency map and the rain image Concat, sending the fused saliency map and the rain image Concat to a multi-scale fusion generation countermeasure network to enable the rain image Concat to accurately notice a raindrop area to be repaired, then carrying out rain removal training on the multi-scale fusion generation countermeasure network, carrying out the rain removal training, and alternately updating parameters of a generator network and a discriminator network to obtain a final rain removal image model. Inputting the test set into a trained model to obtain an image after rain removal, completing rain removal processing of a single image, and evaluating the generated rain removal image according to SSIM and PSNR indexes.

Specifically, the present invention achieves the above object by the following scheme:

a single image rain removing method based on a multi-scale fusion generation countermeasure network comprises the following steps:

s1, establishing a data set, wherein the data set comprises a training set and a testing set, the training set and the testing set respectively comprise an image pair of a rain image and a corresponding clear rain-free image, and the image pair is obtained by preprocessing an original image;

the step S1 includes the steps of:

s1.1, selecting an image pair consisting of a rain image and a corresponding clear rain-free image;

s1.2, carrying out image brightness contrast enhancement on the image pair in the step S1.1;

s1.3, image inversion is carried out on the image pair in the step S1.1 and the step S1.2;

s1.4, after the pretreatment of the steps S1.1 to S1.3, the images form a data set, and the data set is divided into the training set and the data set according to a certain proportion.

S2, generating a saliency map through saliency detection according to the rain images in the training set: firstly, performing superpixel segmentation on the rained image in the training set by adopting an SLIC algorithm, then determining a raindrop area through image significance, detecting raindrop pixels by combining a raindrop model, and then performing edge detection on the rained image to determine the raindrop area to generate the saliency map;

s3, fusing the saliency map generated in the step S2 and the rain images in the training set in a Concat mode of combining in channel dimension to generate a fused image;

s4, sending the fusion images generated in the step S3 into a multi-scale fusion generation confrontation network for rain removal training to obtain a network model capable of removing rain from the rain images in the training set;

the step S4 includes the steps of:

s4.1, inputting the fusion image generated in the step 3 into a generator network for generating a countermeasure network through multi-scale fusion;

the generator network for generating the countermeasure network through multi-scale fusion comprises 16 convolutional layers, l12, l14 and l16 are subjected to up-sampling through a Deconv function, three layers of l12, l14 and l16 are fused through a Concat method after batch normalization and Relu activation functions are added respectively, and a generated rain removal image G (R) is output;

the loss function of the generator network for generating the countermeasure network by multi-scale fusion comprises multi-scale loss and perception loss, and is defined as:

L_G＝10^-2L_GAN(O)+L_m({S}，{T})+L_p(O，T)

in the formula, L_GAN(O) ═ log (1-d (O)); the multi-scale loss function is

In the formula, L_MSERepresenting mean square error, S_iRepresenting the ith output, T, extracted from the generator layer_iIs represented by having a sum of S_iData of said corresponding clear rain-free image of the same scale, λ_iAre weights of different scales; the perceptual loss function is:

L_p(O，T)＝L_MSE(VGG(O)，VGG(T))

where VGG is the representation of the feature space generated from a given input image, O is the rain-removed image of the generator's output, and T is the clear rain-free image corresponding to the rain image;

s4.2, inputting the rain-removing image G (R) and the clear rain-free image corresponding to the rain-removing image G (R) generated in the step S4.1 into the discriminator network of the multi-scale fusion generation countermeasure network, calculating and outputting the identifier network P_cleamDetermining whether the input is from the clear rain-free image or the generated rain-removed image g (r);

the discriminator network for generating the countermeasure network through multi-scale fusion comprises 7 convolutional layers, the consistency of the overall image checking is combined with the specific area of the local discrimination checking by adopting the global checking, the characteristics are extracted from the convolutional layer of the last but one layer and provided for the convolutional neural network, the output of the convolutional neural network is multiplied by the original characteristics in the discriminator network and then transmitted into the next layer, the discriminator network is guided to focus on the rain area determined by the saliency map, and finally a full connection layer is adopted to judge the truth and falseness of the input image;

the loss function of the discriminator network is:

L_D(O，T)＝-log(D(T))-log(1-D(O))+γL_map(O，T)

in the formula, L_mapRepresented as the loss between the saliency map generated in the discriminator network and the clear rain-free image, is defined as:

L_map(O，T)＝L_MSE(D_map(O))+L_MSE(D_map(T)，0)

in the formula, D_mapRepresenting a process of a two-dimensional saliency map generated by said network of discriminators, T being said clear rain-free image, 0 representing a map containing only 0 values;

s4.3, the generator network and the discriminator network carry out countermeasure training and alternately update the parameters of the generator network and the discriminator network, and when the discriminator network cannot correctly estimate whether the input is from the generated rain removing image G (R) or the corresponding clear rain-free image, a network model capable of removing rain from the rain image in the training set is obtained;

and S5, inputting the images with rain in the test set into the network model to obtain images without rain, and evaluating the effect of the network model.

Compared with the prior art, the invention has the beneficial effects that: aiming at the uncertainty of a rained area, the rained area to be removed is detected and positioned by utilizing the image significance, Concat multi-scale fusion is adopted for l12, l14 and l16 layer networks in a generator network, the consistency of semantic information and texture structures of images in the rain removing process of the images is better guaranteed, the fuzziness of the rain removing images is avoided, the quality of the generated rain removing images is improved, all evaluation indexes are greatly improved compared with the conventional method, the visual effect of image generation is better, and the removed raindrop area has more authenticity and continuity.

Drawings

FIG. 1 is an overall flow chart of the method for generating a countermeasure network based on multi-scale fusion to remove rain from a single image according to the present invention;

FIG. 2 is a flow chart of significance detection in the method for raining a single image based on multi-scale fusion generation countermeasure network according to the present invention;

FIG. 3 is a diagram of a construction of a confrontation network in the method for removing rain from a single image by the confrontation network generated based on multi-scale fusion according to the present invention;

FIG. 4 is an image processing result of the rain removing method for a single image based on a multi-scale fusion generation countermeasure network of the present invention, wherein (a) is a rain image, (b) is a rain removing image generated by a model, and (c) is a corresponding clear rain-free image;

FIG. 5 is a comparison graph of rain removing effects of the method for rain removing of a single image by the confrontation network generated based on multi-scale fusion and other algorithms on the same image, wherein (a) is a rain image, (b) is a clear rain-free image, (c), (d) and (e) are rain removing images generated by other three algorithms, respectively, and (f) is a rain removing image generated by the method for rain removing of a single image by the confrontation network generated based on multi-scale fusion.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1-3, the present invention provides a method for removing rain from a single image based on multi-scale fusion generation of an anti-rain network, which includes the following steps, and the overall flow chart is shown in fig. 1:

step S1: an image data set is constructed.

The raw data set consisted of a total of 1119 pairs of rain and clear rain-free images, including a training set (860 pairs) and a test set (259 pairs). In order to solve the problem of few data set samples, data enhancement is carried out on the data set: the method comprises the following steps of 1: 1.5, the contrast of the image is enhanced, and the brightness of the image is improved; and secondly, image inversion, namely performing 180-degree image inversion on the rain image and the clear image of the original data set to enrich the data set, so that the generalization capability and the robustness of the model during training are improved.

Step S2: and (3) detecting the saliency of the image, namely accurately detecting a raindrop region and an edge region thereof in the image to obtain a saliency map.

Step S2.1: the super-pixel segmentation adopts an SLIC algorithm, and the computational complexity of the subsequent processing of the picture is reduced on the basis of keeping the integrity of the target characteristics.

Step S2.2: and determining a raindrop area through the image significance, and realizing the detection of raindrop pixels by combining a raindrop model.

The raindrop model is the combination of a background image and a raindrop effect, and the model is as follows:

D＝(1-M(x))⊙C+R

where D is the input rained image, M is a binary mask, where M (x) ═ 1 in the mask indicates that pixel x is part of the raindrop region, otherwise, it is part of the background region, C is the clear image, and R is the effect of raindrops. Operator |, indicates element multiplication. For the raininess image in the training set constructed in the step 1, the raininess image D is subtracted by the corresponding clear image C to obtain a mask portion affected by raindrops, and then whether the area is a part of a raindrop area is determined through image saliency detection.

Step S2.3: and edge detection is performed by combining the raindrop image, so that the detection precision of the raindrop area is improved, and the saliency map is finally obtained.

The raindrop area and the edge area thereof in the image can be accurately detected by detecting the raindrop image through the image saliency, so that the network can remarkably notice the raindrop area in the image, the detection precision of the raindrop area is improved, and the saliency map is finally obtained, namely the raindrop area to be repaired is detected.

Step S3: and performing Concat fusion on the rain image and the saliency map to generate a fused image.

In the invention, the Concat fusion mode is that channel dimensionality is combined, information under each layer of characteristics is not increased, but information amount under each dimensionality is increased, so that the overall characteristic information of the image is increased, the fused image contains richer characteristic information, the reality of the generated rain removing image is greatly facilitated, and the training effect is better.

Step S4: the fused image generated in step S3 is sent to a multi-scale fusion generation countermeasure network (hereinafter referred to as MsF-GAN network, the network structure is shown in fig. 3) for rain removal training, so as to obtain an image rain removal processing network model.

Step S4.1: and fusing the generator network of the image input MsF-GAN network to generate a clear rain-free image G (R) and define a generator multi-scale loss function.

The number of network layers in the rain removing process is 16 layers (layers, l). l 1: l6, l11, l12, l14 and l16 are Conv-Relu modules (Conv, convolution; Relu, activation function), l 7: l10 is a partition-Relu module (partition, hole convolution), and l13 and l15 are Deconv-avg _ p-Relu modules (Deconv, deconvolution; avg _ p, average pooling). Skip-connections are added between l2 and l15 to prevent ambiguous outputs. l 7: l10 introduces translation to increase the receptive field of the network, so that the network can extract abundant image features. In order to avoid losing important image characteristics and semantic information along with the deepening of the number of network layers when a rain area is repaired, the generated rain removing image information is ensured to be more complete and real by utilizing the idea of multi-scale fusion and adopting a Concat fusion mode. In the fusion layer number, networks l12, l14 and l16 are selected to respectively carry out Deconv, Batch Normalization (BN) and Relu activation function processing. Height × width × channel of l12, l14 and l16 are respectively 60 × 90 × 256, 120 × 180 × 128 and 240 × 360 × 32, firstly, Deconv is used for up-sampling l12, l14 and l16, then batch normalization and Relu activation functions are added to play a Concat fusion role to a greater extent, finally, three layers are fused, and a rain-removing image with height × width × channel of 480 × 720 × 9 is output.

The generator contains two loss functions: multiscale loss and perceptual loss; the multiscale loss function is defined as:

in the formula, L_MSERepresenting mean square error, S_iRepresenting the ith output, T, extracted from the generator layer_iIs represented by having a sum of S_iData of corresponding clear rain-free images of the same scale. Lambda [ alpha ]_iAre weights of different scales. Further, the sizes of the l12, l14 and l16 images are 1/4, 1/2 and 1 of the size of the original image, respectively, the information of the smaller layer occupies a relatively small degree of importance relative to the larger layer, and the multi-scale fusion generates the lambda in the multi-scale loss function of the generator network of the countermeasure network_iMay be set to 0.6, 0.8, 1.

The perceptual loss function is defined as:

L_p(O，T)＝L_MSE(yGG(O)，VGG(T))

where VGG-16 is a pre-trained convolutional neural network that produces an expression of the feature space, i.e., VGG feature (VGG feature), from a given input image, O is the rain-removed image of the output of the generator, and T is the corresponding clear rain-free image. The total loss of the MsF-GAN generator network is therefore:

L_G＝10^-2L_GAN(O)+L_m({S}，{T})+L_p(O，T)

in the formula, L_GAN(O)＝log(1-D(O))。

Step S4.2: inputting the rain-removed image G (R) and the corresponding clear image T generated in the step S41 into a discriminator, calculating and outputting the clear image T belonging to the discriminator P_cleamDetermines whether the input is from the corresponding clear no-rain image T or a false generated rain-removed image g (r), and defines a loss function for the discriminator.

The discriminator network structure comprises 7 convolution layers, the number of cores is (3, 3), the number of all-connected layers is 1024, and a sigmoid activation function is adopted by a single neuron. Features are extracted from the penultimate convolutional layer and provided to a convolutional neural network. And (3) using the output of the convolutional neural network, multiplying the output by the original features in the discriminator network, transmitting the multiplied features into the next layer, guiding the discriminator network to focus on a rain area determined by the saliency map, and finally judging whether the input image is true or false by adopting a full connection layer.

Defining a loss function between the extracted features from the inner layers of the discriminator and the saliency map from the output of the convolutional neural network and the image saliency map as:

L_map(O，T)＝L_MSE(D_map(O))+L_MSE(D_map(T)，0)

in the formula, D_mapRepresenting the course of a two-dimensional saliency map generated by a discriminator, T is the corresponding clear rain-free image, 0 represents a map containing only 0 values. The overall loss function of the discriminator is:

L_D(O，T)＝-log(D(T))-log(1-D(O))+γL_map(O，T)

in the formula, L_mapThe parameter y is set to 0.05, expressed as the loss between the saliency map generated in the discriminator and the corresponding clear, rain-free image.

Step S4.3: both the generator and the discriminator are countertrained and the parameters of the generator G and the discriminator D are alternately updated, the batch size and the initial learning rate of the training are set to 1 and 0.0002, respectively, and the number of training iterations is 20000. When the discriminator cannot correctly estimate whether the input comes from the generated rain-removed image G (R) or the corresponding clear rain-free image, a network model capable of removing rain from the rain image in the training set is obtained.

The discriminator network and the generator network play games with each other, and the effect of generating the rain removing image can be effectively improved by judging whether the image is true or false. Concat multi-scale fusion is adopted for l12, l14 and l16 layer networks in the generator network, so that the consistency of semantic information and texture structures of images in the rain removing process of the images is better guaranteed, the blurriness of rain removing images is avoided, and the quality of the generated rain removing images is improved. The authenticity of the generated rain-removed image can be more surely identified by combining global and local authentication in the discriminator network.

Step S5: inputting the test set into a trained multi-scale fusion to generate an anti-network model to obtain an image after rain removal, completing rain removal processing of a single image, and evaluating the generated rain removal image according to SSIM and PSNR indexes.

The result of the rain removal processing in this embodiment is shown in fig. 4. (a) The image is a rain image, (b) is a rain-removed image of the embodiment, and (c) is a corresponding clear no-rain image of (a). The algorithm of the invention has obvious effect from the comparison of the rain removing effect image and the clear rain-free image of the network.

SSIM defines the formula: SSIM (x, y) ═ l (x, y)^α·c(x，y)^β·s(x，y)^γIn the formula (I), wherein,

wherein x and y are respectively a reference image and an image to be measured, sigma_x，σ_y，

σ_xyRepresenting the mean, variance and covariance of the images x, y, respectively, c₁，c₂，c₃A larger SSIM value indicates a smaller image distortion for small positive integers.

PSNR defines the formula:

where MSE is the mean square error between the original image and the processed image, then

MAX₁Representing the maximum pixel value of an image, the pixel values herein are represented by B-bit binary, then: MAX₁ ²＝2^B-1; in the color image processing, the MSE of the RGB three channels is calculated and then divided by 3. The larger the PSNR value is, the smaller the image distortion is.

Fig. 5 shows the comparison of the rain removing effect of the MsF-GAN and other three deep learning based rain removing methods on the same image in the embodiment. (a) A rain image and (b) a clear rain-free image. (c) And (d), (e) and (f) are rain-removed images generated by the method A, the method B, the method C and the MsF-GAN method after the rain-removed image is subjected to rain removal. Through image comparison, it can be seen that the method A has significant rainwater residue on the rain removing effect, the method B has a certain amount of raindrops, and the image is fuzzy. The raindrop area processed by the method C is still fuzzy, raindrops are hardly remained in the rain-removed image obtained through the MsF-GAN network, and the generated picture has the best effect.

The PSNR and SSIM indices of the rain-removed image in fig. 5 are compared, and the results are shown in table 1, which shows that the PSNR and SSIM indices of the method of the present invention are higher than those of other methods. The effectiveness of the method is verified through a comparison experiment.

TABLE 1 evaluation results of image restoration quality by different methods

Claims

1. A single image rain removing method based on a multi-scale fusion generation countermeasure network is characterized by comprising the following steps:

the step S1 includes the steps of:

s1.4, forming a data set by the image pairs after the pretreatment of the steps S1.1-S1.3, and dividing the data set into the training set and the data set according to a certain proportion;

the step S4 includes the steps of:

L_G＝10^-2L_GAN(O)+L_m({S},{T})+L_p(O,T)

in the formula, L_GAN(O) ═ log (1-d (O)); the multi-scale loss function is

L_p(O,T)＝L_MSE(VGG(O),VGG(T))

s4.2, inputting the rain-removing image G (R) generated in the step S4.1 and the clear rain-free image corresponding to the rain-removing image G (R) into the discriminator network of the multi-scale fusion generation countermeasure network, calculating and outputting the identifier network P_cleamDetermining whether the input is from the clear rain-free image or the generated rain-removed image g (r);

the loss function of the discriminator network is:

L_D(O,T)＝-log(D(T))-log(1-D(O))+γL_map(O,T)

in the formula, L_mapRepresented as a saliency map and said clear absence generated in said discriminator networkLoss between rain images, defined as:

L_map(O,T)＝L_MSE(D_map(O))+L_MSE(D_map(T),0)