CN112258402A

CN112258402A - Dense residual generation countermeasure network capable of rapidly removing rain

Info

Publication number: CN112258402A
Application number: CN202011061765.8A
Authority: CN
Inventors: 苑士华; 米颖; 李雪原
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2021-01-22

Abstract

The invention discloses a dense residual generation confrontation network for quickly removing rain, which comprises the following steps: building an algorithm operation environment; b. building a rain removing data set; c. designing a dense residual error generator sub-network for fast rain removal; d. designing a dense residual discriminator subnetwork for quickly removing rain; e. designing a loss function of a discriminator and a generator; f. and carrying out image post-processing and index calculation, and verifying the rain removal effect of the algorithm model. The invention builds the operation and test environment of the whole algorithm, and reduces the time consumption as much as possible while ensuring the rain removing effect and keeping enough background details through the designed rain removing model. According to the result test, the rain removing time of each image is about 0.02s, and the rain removing efficiency is greatly improved.

Description

Dense residual generation countermeasure network capable of rapidly removing rain

Technical Field

The invention relates to the technical field of unmanned aerial vehicle driving visual environment perception, in particular to a dense residual error generation countermeasure network for quickly removing rain.

Background

For outdoor vision tasks, such as vision-based unmanned vehicle environmental perception, pedestrian and pavement marking detection, tracking, road monitoring, etc., clean, clear and visible images are essential. The reliable visual image can help people to complete various visual tasks more accurately and efficiently, and meanwhile, unnecessary errors can be reduced or even avoided. However, under severe weather conditions, especially rainy weather, the visibility of the image is severely degraded, which brings great difficulty to various visual tasks. Therefore, research on image rain removal technology is imminent. This task is very challenging, however. Firstly, fog can be brought by rain water in heavy rain weather to enable an image to be blurred, and it is difficult to keep clear background details while removing rain lines; secondly, for a visual task needing to remove rain in real time, such as an environment perception task of an unmanned vehicle based on vision, how to achieve rapid rain removal and reduction of time consumption of each frame of image are also a difficult problem.

Rain removal technology currently has two main categories of objects: video and images. For video rain removal, a large amount of prior information in a time domain can be obtained, so that rain line residues can be easily detected and removed. Whereas, the raining of a single image is concentrated on one image, and similar priors are lacked, so that the raining is more difficult. Some scholars have proposed methods for removing rain from a single image, and the methods are mainly divided into a rain removing method based on a model and a rain removing method based on deep learning. For example, the model-based rain removal method includes a guided filtering method, a low-rank appearance model method, a non-local mean filtering method, a gaussian mixture model method, a contrast enhancement method, and a scattering restoration method. Based on the method of guiding filtering, a difference image of the maximum value and the minimum value of the RGB channel of each pixel point is used as a guiding image, a source image is input to conduct guiding filtering processing, and a rain removing image is obtained. The scattering repair method is to use scattering lines to continuously scan nearby pixels to fill the damaged rain-removing area, but the scanning area needs to be artificially set, and black spots are easy to appear due to the influence of light.

In addition to the model-based methods described above, there are also deep learning-based methods such as dictionary learning sparse coding, CNN + RNN, GAN, and the like. The method based on deep learning has achieved great success in the field of computer vision in recent years, and is not only used for image rain removal, but also used in the fields of image denoising, defogging, identification, detection, conversion and the like. Although the method based on deep learning has outstanding effect, the time consumption is long, and the problems of instability, non-convergence, little diversity and the like are easy to occur in the network training process.

Disclosure of Invention

The invention aims to provide a dense residual generation confrontation network for quickly removing rain, aims to solve the problems of rain line residue, overlong time consumption and the like in the rain removing problem of a visual environment perception image, retains enough background detail information while ensuring the rain removing effect, reduces the rain removing time consumption as much as possible, and establishes a foundation for removing rain in real time in the driving process of an unmanned vehicle.

The invention builds a training environment of an integral algorithm, and the main environment parameters are as follows:

the technical scheme of the invention is that a dense residual error generation countermeasure network for rapidly removing rain is designed by taking a central idea of generation of the countermeasure network, namely a game theory, as a main idea, and an integral algorithm operation environment, namely a training environment and a testing environment, is firstly established; the design of the whole algorithm framework is carried out, and the design comprises the preprocessing of a data set, a generator sub-network, a discriminator sub-network, a loss function, the image post-processing and an index calculation function.

The method specifically comprises the following steps:

a. building an algorithm operation environment;

b. building a rain removing data set;

c. designing a dense residual error generator sub-network for fast rain removal;

d. designing a dense residual discriminator subnetwork for quickly removing rain;

e. designing a loss function of a discriminator and a generator;

f. and carrying out image post-processing and index calculation, and verifying the rain removal effect of the algorithm model.

Further, in step a, firstly installing the Linux system, and installing the driver according to the video card version of the computer, wherein the currently used driver version is NVIDIA-Linux-x86_64-440.44. run. After the installation drive is completed, the tensiorflow use version is determined, and the model is designed and built depending on the tensiorflow 2.1.0 version. CUDA and cuDNN versions are determined from the driver and tenserflow versions, and CUDA Toolkit10.1 and cuDNN7.6 versions are used according to version alignment. All algorithm models are designed in the environment, and if the versions do not correspond to each other, problems can occur in the operation process.

Further, in step b, including,

b1, firstly, building a data set of a composite rain map, considering factors of the size, the direction and the density of a rain line, and adding the rain line on the collected rain-free image data set by using photoshop software;

b2, then building a real rain chart data set, and searching and shooting a real rain scene through a crawler on the internet to acquire rain chart data;

b3, mixing the real rain chart data set and the synthesized rain chart data set, and dividing the training set and the testing set according to the proportion of 7: 3.

Further, in step c, including,

c1, preprocessing the input image, including randomly cutting the input image into a specified size of 250 x 250, randomly turning the input image, normalizing the input image to be in the range of [ -1,1], and performing data enhancement and normalization to generalize the model and avoid overfitting;

c2, randomly disordering the input data set and inputting the data set into a generator sub-network;

c3, the input image passes through the LSTM long-short term extreme network module of the generator sub-network, and whether the pixel value and the feature point are forgotten or not and input and output is judged by the LSTM long-short term extreme network module;

c4, and inputting the result into a dense residual error network of a generator after the convolution and feature selection of the LSTM.

Further, in step c4, the dense residual network is optimally designed based on the residual network, and mainly has the main function of rapidly extracting and transferring features, so that the time consumption is greatly reduced under the condition of ensuring complete removal of rain lines. Compared with the common residual error network which takes two convolutional layers and two active layers as a module and carries out feature transfer at two separated layers, the dense residual error network trains by taking a single convolutional layer and a PReLU active function as a whole and carries out feature transfer between the separated layers, so that feature information of the original image background is kept as much as possible while rain is removed.

Further, in step d, including,

d1, designing general normalization for a convolution kernel in the convolution process of the discriminator sub-network, and aiming at enabling data distribution to meet Lipschitz constraint;

d2, inputting the rain-free image generated by the generator into a discriminator sub-network for convolution, feature extraction and judgment, generating feature maps among the discriminator levels, and calculating the loss function of the discriminator together with the feature map of the generator, thereby supervising the discriminator to pay attention to the rain line area in the original rain map as much as possible for discrimination;

d3, the discriminator network contains 13 convolution layers with convolution kernel of 5, two full connection layers, the first full connection layer outputs 1024, the second full connection layer outputs 1, whether it is a real rainless image is judged, in the process of characteristic extraction, the result of the eighth convolution layer is used as the output of characteristic mapping, and is multiplied with the original characteristic later as the guide of judging the rain line area;

d4, outputting the result of the discrimination by the discriminator, inputting the output of the discriminator, the output of the generator and the feature mapping into a loss function module, calculating the loss function and optimizing the loss function.

Further, in step e, including,

e1, the generator loss function contains three parts: countermeasure loss, multi-scale structural similarity loss, multi-scale peak signal-to-noise ratio loss, and multi-scale euclidean distance loss;

e2, the loss function of the discriminator comprises three parts: cross entropy loss, feature mapping loss, and R2 regularization loss;

e3, firstly calculating the loss of the discriminator, forming a gradient to optimize the discriminator, then calculating the loss of the generator according to a certain training step number, and then optimizing the generator, wherein the generator is trained once by the discriminator every three times.

Further, in step f, including,

f1, after a certain number of steps, post-processing the image, calculating indexes,

f2, saving the model and related parameters after a certain number of steps.

Further, in step f1, the indexes of the image adopt two evaluation indexes of peak signal-to-noise ratio PSNR and multi-scale structural similarity MS _ SSIM.

The invention has the beneficial effects that:

(1) a dense residual and LSTM combined generator sub-network is designed. Different from the existing method, the dense residual error network takes a convolution layer and an activation layer as basic modules, the transfer of characteristic information is carried out between the interlayer layers, and multiple iterations are adopted for characteristic extraction and rain-free image generation. Due to the fact that the whole network structure is simplified, the connection of features is enhanced, the network can generate a rain-free image meeting the environment perception task, and compared with an existing rain removing method, the rain removing speed is greatly improved.

(2) In the generator and the discriminator, rain characteristic mapping is designed and generated to guide the generator and the discriminator to better pay attention to a rain area in a training process, so that the generator can carry out rain removal with a target, and the discriminator can judge whether the rain area of the generated image is real or not with the target. The discriminator uses the normal normalization to achieve the Lipschitz constraint.

(3) For the aspect of the loss function. The loss function of the discriminator adds R2 regular loss besides cross entropy loss to avoid the problem of non-convergence caused by discontinuous generation distribution and the L2 loss between two word network feature mappings. The generator loss function contains three parts, namely, a countermeasure loss, a feature loss between the generator feature map and the target image, and a negative structural similarity loss.

Drawings

FIG. 1 is a schematic flow chart of the present invention for fast rain-removing dense residual error generation as a countermeasure network;

FIG. 2 is a block diagram of the framework for fast rain-out dense residual generation countermeasure network of the present invention;

FIG. 3 is a schematic diagram of the LSTM model used in the generator subnetwork of the present invention;

FIG. 4 is a schematic diagram of a dense residual error network designed by the present invention;

FIG. 5 is a schematic diagram of a discriminator sub-network structure according to the present invention;

FIG. 6 is a diagram illustrating the basic idea of generating a countermeasure network according to the present invention;

FIG. 7 is an overall algorithm flow framework of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", etc. indicate orientations or positional relationships based on those shown in the drawings or orientations or positional relationships that the products of the present invention conventionally use, which are merely for convenience of description and simplification of description, but do not indicate or imply that the devices or elements referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention.

Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should also be noted that, unless otherwise explicitly specified or limited, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly and may, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

As shown in fig. 1, the present invention provides a dense residual error generation countermeasure network for fast rain removal, comprising the steps of:

a. building an algorithm operation environment;

b. building a rain removing data set;

e. designing a loss function of a discriminator and a generator;

Specifically, as shown in fig. 2, the present invention proposes a dense residual error generation countermeasure network for fast rain removal. Feature supervision between iteration levels is provided in the generator. Different from other networks, in order to avoid the complexity of the network, reduce the time consumption and improve the rain removing efficiency, a generator comprising a dense residual error network is designed, and multiple iterations are carried out on the basis of the LSTM and the dense residual error network, and meanwhile, unnecessary batch regular operation is reduced. Compared with the common residual error network which takes two convolutional layers and two active layers as a module and carries out feature transfer at two separated layers, the dense residual error network trains by taking a single convolutional layer and a PReLU active function as a whole and carries out feature transfer between the separated layers, so that feature information of the original image background is kept as much as possible while rain is removed. In order to utilize the feature information between the iteration hierarchies of the generator to help the arbiter generating the countermeasure network to judge, a feature map is also generated between the arbiter hierarchies and is used for calculating the loss function of the arbiter together with the feature map of the generator, so that the supervision arbiter focuses on the background information in the original rain map as much as possible to judge. In order to increase the stability of generation of the antagonistic network training and accelerate convergence, a general normalization method is adopted in the discriminator network to realize the Lipschitz constraint.

The loss of the discriminator contains two parts: cross entropy loss of true distributions and generated distributions, and regularized loss. The loss of the generator includes three parts: countermeasure loss, feature surveillance loss, and structural similarity loss. We define the rain map as a combination of the background image and the rain line:

I＝B+R

wherein, B refers to a clean background image, R refers to a degradation effect caused by rain lines, and I is a color input rain chart representing a complex fusion effect of the background environment and the reflection caused by the rain lines.

The invention designs a rapid dense residual error generation countermeasure network by taking generation of the countermeasure network as a basic idea, and the network structure is shown in figure 2. This network essentially consists of two parts, a generator sub-network and a discriminator sub-network. The main function of the generator is to generate a real, clean rain-free image of a rain image as quickly as possible. The discriminator judges whether the input generated image is a real image.

The specific resulting antagonistic cost function is as follows:

d represents the arbiter sub-network and G represents the generator sub-network. Pr represents the true distribution, and y represents a sample of the true distribution, representing a clean, rain-free background image; pg represents the generation distribution, and x represents one sample of the generation distribution, representing the original rain image corresponding to y.

In order to improve rain removing efficiency as much as possible and effectively utilize characteristic information of an image, the structure of a generator subnetwork is simplified, and the generator subnetwork mainly comprises two parts: the long and Short Term Memory networks LSTM (Long Short Term Memory networks) shown in FIG. 3 and the dense residual error networks DRN (dense recoals networks) shown in FIG. 4. The LSTM is a kind of recurrent neural network RNN, and mainly comprises a forgetting gate, an input gate and an output gate. Although it is not favorable for parallelization, due to the specific advantages of its own structure, it has great advantages for processing in time series, retaining useful information, utilizing feature information between hierarchies to perform remote prediction, and the like. The dense residual network DRN is designed on the basis of ResNet. In order to improve the utilization degree of characteristic information as much as possible, purposefully and purposefully remove rain lines and reduce the time consumption of rain removal as much as possible, a dense residual error network connected by interlayers is designed. And taking the final output of the dense residual error network as feature mapping, inputting the feature mapping and the original rain map into a generator subnetwork again, guiding the generator subnetwork to remove rain, and obtaining the output in the last iteration as a rain-free image generated by the generator.

Input data is x_tAnd h_t-1Splicing the two data, and determining which data are forgotten through a sigmoid activation function; and the spliced data passes through an input gate, the sigmoid activation function of the input gate determines which data are to be updated, and the tanh function creates a post-selected value of the updated data. The old cell state is multiplied by the output of the forgetting gate and then added to the new post-selection value to obtain the new cell state. And finally through an output gate. The output gate also comprises a sigmoid and a tanh activation function, and the unit state is converted into [ -1 ] through the tanh activation function，1]Within the range, the output of the sigmoid activation function is then multiplied, which functions to determine which of the outputs of the cell states are. The treatment process is as follows:

f_t＝σ(W_f·[h_t-1，x_t]+b_f)

i_t＝σ(W_i·[h_t-1，x_t]+b_i)

g_t＝tanh(W_g·[h_t-1，x_t]+b_g)

o_t＝σ(W_o·[h_t-1，x_t]+b_o)

c_t＝f_t·c_t-1+i_t·g_t

h_t＝o_t·tanh(c_t)

the dense residual error network aims at realizing rapid feature extraction and rain removal, and has the greatest advantages of short time consumption and high feature utilization rate. The input of the dense residual network is the output of the LSTM, and during each iteration, the output is the input of each generator sub-network, and the output of the last iteration is used as the rain-free image generated by the generator. The dense residual network comprises 10 convolutions + activations, and jump chaining is used to improve the utilization rate of pixel information and avoid fuzzy output. This hopping link is different from that of the original residual network. And the original residual error network is activated into a unit by two times of convolution, and the characteristic values are connected between the units. In addition to the feature mapping transmission performed by two units after the first convolution, the dense residual error network designed in the text also performs the feature mapping transmission on the "other side" after the second convolution, which is equivalent to transmitting the feature mapping once every other convolution + activated unit, so that the pixel value utilization rate is improved, and the excessive time consumption is not increased.

The loss function of the generator defined by the invention comprises three parts: countermeasure loss, multi-scale structural similarity loss, multi-scale euclidean distance loss. Wherein the multi-scale structural similarity loss is defined as:

where Mi represents the output of the generator subnetwork at each iteration, T represents the target image, λ i represents the weight of the multi-scale structural similarity calculated at each iteration, and the more the number of iterations is in the weight list of [0.2, 0.4, 0.6, 0.8, 1.0], the larger the weight is.

The multi-scale euclidean distance loss is defined as:

wherein, M represents the feature mapping output by the generator in each iteration, T represents the target image, j represents the element number, i represents the iteration frequency, and lambdai represents the Euclidean distance loss weight calculated in each iteration, and the more the iteration frequency is, the larger the weight is.

The challenge loss is defined as:

L_{GAN_G}＝log(1-D(G(x)))

wherein, x is the original rain picture, G (x) is the no rain picture generated by the generator, and D (G (x)) is the output of the discriminator. In summary, the loss of the generator can be written as:

wherein β 1 and β 2 are the weights of L2 loss and multi-scale structure similarity loss, respectively.

In order to fully utilize global information and local information, a mode of combining global consistency with local consistency is adopted in the design process of the arbiter network. In this context, the global feature is used to check whether the feature is consistent with the target image, and the local consistency is determined by using the feature mapping in the generator and the discriminator.

And respectively inputting the generated image and the discrimination image into a discriminator, gradually extracting the features, forming a feature map at a certain stage, and using the feature map formed by the target image and the generated rain pattern in the discriminator to calculate the feature loss. At a subsequent stage, the feature map is multiplied by the original features, and the product is continuously input into a subsequent network of discriminators. And multiplying the feature mapping and the original features to form input, and aiming at providing guidance for subsequent judgment by utilizing the previously extracted rain line information. In order to output the judgment result at the end, the last two layers use the full connection layer. The specific network structure is shown in fig. 5.

In the process of feature extraction, the discriminator restrains the normal norm of each layer of weight matrix, namely normal normalization, so that the mapping function of the discriminator meets the Lipchitz constraint and the training stability is enhanced. The penalty functions of the discriminators include cross-entropy penalty, feature mapping penalty, and R2 regularization penalty. Wherein the feature mapping loss is defined as follows:

L_mask＝L_mse(M_fake，G(x))+L_mse(M_real,M₀)

MSE is the euclidean distance. M_fakeMeans that the generated image is input to a feature map, M, extracted in a discriminator_realRefers to inputting a real rain-free image to a feature map extracted by a discriminator, and M0 refers to the feature map and M_fakeAnd M_realTensors having the same shape but having all element values of 0. The above loss function is intended to concentrate the attention of the discriminator on the characteristic area of the rain line extracted by the generator, but for the true rain-free image, since there is no rain line, there is no area that needs to be distinguished with concentrated attention, and all the areas are expressed by tensors of 0.

R2 canonical loss was invented by Lars Mescheder et al in 2018, and the authors found that for inconsistent continuous generation distributions, the classical GAN network could not converge, and therefore designed an R2 canonical penalty term, expressed as follows:

the cross entropy loss is defined as follows:

L_{GAN_D}＝-logD(y)-log(1-D(G(x)))

where y is the corresponding clean no-rain image and x is the original rain image. Thus, the overall penalty of the discriminator is:

where δ 1 and δ 2 are the loss of the feature map and R2 regularization terms, 100 and 1000, respectively. The discriminator network comprises 13 convolution layers with convolution kernel of 5 in total, two full-connection layers, wherein the output of the first full-connection layer is 1024, and the output of the second full-connection layer is 1, so that whether the image is a real rain-free image or not is judged. In the process of feature extraction, the result of the eighth convolutional layer is used as the output of the feature map, and is multiplied by the original feature later to be used as a guide for judging the rain line area.

The basic idea of the present invention is evolved from the basic idea of generating a countermeasure network, as shown in fig. 6, which is the game theory idea of GAN network. The purpose of the generator is to generate a rain-free image spoofing discriminator that is as realistic as possible, and the discriminator is to determine whether the input is rain-free image data generated by the generator or real rain-free image data, which optimize each other in the course of the game. The overall algorithm flow is shown in fig. 7.

The invention builds the operation and test environment of the whole algorithm, and reduces the time consumption as much as possible while ensuring the rain removing line and reserving enough background details through the designed rain removing model. According to the result test, the rain removing time of each image is about 0.02s, and the rain removing efficiency is greatly improved.

The present invention is capable of other embodiments, and various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention.

Claims

1. A dense residual generation countermeasure network for fast rain shedding comprising the steps of:

a. building an algorithm operation environment;

b. building a rain removing data set;

e. designing a loss function of a discriminator and a generator;

2. The fast rain shedding dense residual generation countermeasure network of claim 1, wherein: in the step a, firstly installing a Linux system, installing a driver according to a video card version of a computer, determining a tensierflow use version after the driver is installed, and determining CUDA and cuDNN versions according to the driver and the tensierflow version.

3. The fast rain shedding dense residual generation countermeasure network of claim 1, wherein: in step b, the step of, including,

4. The fast rain shedding dense residual generation countermeasure network of claim 3, wherein: in step c, the step of, including,

5. The fast rain shedding dense residual generation countermeasure network of claim 4, wherein: in step c4, the dense residual network is trained with a single convolutional layer and a PReLU activation function as a whole, and features are transferred between layers.

6. The fast rain shedding dense residual generation countermeasure network of claim 4, wherein: in step d, the step of, including,

7. The fast rain shedding dense residual generation countermeasure network of claim 6, wherein: in step e, the step of, including,

8. The fast rain shedding dense residual generation countermeasure network of claim 7, wherein: in step f, the step of, including,

f2, saving the model and related parameters after a certain number of steps.

9. The fast rain shedding dense residual generation countermeasure network of claim 8, wherein: in step f1, the indexes of the image adopt two evaluation indexes of peak signal-to-noise ratio (PSNR) and multi-scale structural similarity (MS _ SSIM).