CN112085678A

CN112085678A - Method and system suitable for removing raindrops from power equipment machine patrol image

Info

Publication number: CN112085678A
Application number: CN202010923502.7A
Authority: CN
Inventors: 吴毅翔; 陈月卿; 胡琳; 张振兴; 郑剑辉; 王媛婷; 林耀洲; 陈荔芬; 祁琦; 林丽琴
Original assignee: State Grid Fujian Electric Power Co Ltd; Maintenance Branch of State Grid Fujian Electric Power Co Ltd
Current assignee: State Grid Fujian Electric Power Co Ltd; Maintenance Branch of State Grid Fujian Electric Power Co Ltd
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2020-12-15
Anticipated expiration: 2040-09-04
Also published as: CN112085678B

Abstract

The invention relates to a method and a system suitable for removing raindrops from a patrol image of an electric power equipment, which specifically comprise the following steps: constructing and generating a confrontation network, including a generation network and a discrimination network; in the training stage, the generation network generates raindrop images meeting the conditions under the guidance of attention diagram by utilizing the mutual game between the generation network and the discrimination network; in the using stage, the image containing raindrops is input into a trained generator, and a raindrop removing image is obtained.

Description

Method and system suitable for removing raindrops from power equipment machine patrol image

Technical Field

The invention relates to the technical field of power engineering, in particular to a method and a system suitable for removing raindrops from a power equipment machine patrol image.

Background

The unmanned aerial vehicle patrols and examines the image and is the most important information carrier in the unmanned aerial vehicle patrols and examines, and the target detection and the fault location through patrolling the image to the machine can reach the purpose that intelligent patrols and examines. The unmanned aerial vehicle has the raindrop to exist on the camera sometimes at the in-process of open-air line patrol, and the raindrop can cover target object information in the background image, reduces image quality. The raindrops absorb more extensive ambient light when the power equipment is imaged, and the image is degraded due to superposition of the refracted light and the reflected light of the target object. In addition, when the unmanned aerial vehicle patrols the line and takes a picture, the camera should be focused on the power equipment, the existence of raindrops can influence the focusing of the camera to ensure that the image background is blurred, the loss of image detail information is serious, and the follow-up operation of patrolling the image by the raindrop-containing machine can be abnormally difficult. Therefore, the quality of shot machine patrol images is uneven due to raindrops, so that the extraction and utilization of image information are influenced, and the accuracy and reliability of target detection are reduced.

The raindrop removal of a single image is an extremely complex technology in the field of image processing, the existing methods are not too long for carrying out related technical research at home and abroad, and the existing methods are roughly divided into a traditional raindrop removal method and a raindrop removal method based on a CNN (convolutional neural network) network, the traditional raindrop removal method is divided into a filtering-based method and a dictionary learning and sparse coding-based method, and the filtering method comprises a guided filtering method, an improved guided filtering method, a multi-time guided filtering method, an LO (local area) smooth filtering method, a non-mean filtering method and the like. The raindrop-removed image generated by the filtering method is fuzzy, and a part of raindrops cannot be removed. In 2013, the convolutional neural network is used for removing raindrops from images for the first time, a sample library comprising raindrop/raindrop-free image pairs is constructed, corresponding images are segmented by using a sliding window with the step length of 1, then network training is performed by using the mean square error between corresponding image blocks, and finally a convolutional neural network model capable of removing raindrops is obtained.

Research on the existing image raindrop removing method shows that most of the existing traditional raindrop removing methods are based on model raindrop removing, raindrop, a rain line and a background image are respectively described by using the traditional model, and raindrop is removed through step-by-step iterative optimization of a corresponding optimization algorithm. The traditional method has poor effect on processing images with dense raindrops, and background images of areas covered by raindrops cannot be repaired finely. The raindrop removing method based on the convolutional neural network can fully extract the characteristic information of the image, and the effect of removing raindrops of the image by using the method is obviously superior to that of the traditional method. However, with the increase of the depth of the network, the overfitting phenomenon is easy to occur in the network, and the raindrop removing effect is difficult to further improve.

Disclosure of Invention

In view of this, the present invention provides a method and a system for removing raindrops from an image of an electrical equipment, so that the obtained raindrop-removed image is closer to a real image.

The invention is realized by adopting the following scheme: a method suitable for removing raindrops from an image of an electric power equipment machine specifically comprises the following steps:

constructing and generating a confrontation network, including a generation network and a discrimination network;

in the training stage, the generation network generates raindrop images meeting the conditions under the guidance of attention diagram by utilizing the mutual game between the generation network and the discrimination network;

in the using stage, the image containing raindrops is input into a trained generator, and a raindrop removing image is obtained.

Further, the overall loss function of the generation countermeasure network is:

wherein G represents a generation network, D represents a discrimination network, I is an image containing raindrops, R is a real sample containing no raindrops, G (I) is an image from which raindrops are removed, E_{R Pclean}Showing the amount of image detail loss in the raindrop-free image, E_{I Praindrop}Indicating the amount of image detail lost in the raindrop-containing image.

Further, the input of the generation network is an image pair with completely same background scenes, which comprises an image containing raindrops and an image without raindrops, and the image is output as a raindrop removing image; the generating network comprises an attention cycle network and a context automatic encoder;

the attention loop network comprises more than one loop network, an attention diagram is generated in a loop iteration mode, the attention diagram contains the position information of raindrops in the raindrop image, and a context automatic encoder is guided to focus on the raindrops and the surrounding area;

each circulation module comprises more than one residual block, an LSTM unit and a convolution layer, wherein the output of the LSTM unit in each circulation module is input into the convolution layer in the circulation module to generate a 2-dimensional attention map on one hand, and is input into the LSTM unit in the next circulation module to realize feature retention in a time dimension on the other hand;

loss function L per loop module_ATT({ A }, M) is the mean square error between the output attention map A and the binary mask M, as follows:

in the formula, A_tAttention-force diagram, ATT, representing the generation of an attention-cycle network at a time step t_tThe function represents the cyclic block at time step t, F_t-1Representing a fusion of the raindrop-containing image and the last cycle module output attention map; n is the number of cyclic modules, theta represents a random number from 0 to 1, and L_MSERepresenting the mean square error function.

Further, θ is 0.8.

Further, the value of N is 4.

Further, the input of the context automatic encoder is an attention diagram containing a raindrop image and an attention cycle network output, and image raindrop removal and background restoration are realized under the guidance of the attention diagram;

the context automatic encoder comprises 16 conv-relu modules, the encoder and the decoder are partially symmetrical in structure, and skip connection is added between the corresponding modules so as to prevent the raindrop image from blurring.

Further, the context automatic encoder adopts two loss functions, namely multi-scale loss and perceptual loss;

the multi-scale loss function extracts image characteristic information from different layers of a decoder, optimizes the model by fully utilizing the image multi-level information to obtain a clear raindrop-removing image, and has a multi-scale loss function L_M({ S }, { A }) is as follows:

in the formula, S_iRepresenting features of the image extracted from the i-th layer of the encoder, A_NiIs represented by having a sum of S_iTrue images of the same scale, λ_iRepresents the weight of the ith layer, M is the total number of layers of the encoder, L_MSERepresenting a mean square error function;

increased perceptual loss in addition to pixel-point based scale loss for obtaining global context between a context autocoder output and a corresponding sharp pictureA difference. The perceptual loss measures the difference between the raindrop image and the real image from the global perspective of the image, which will make the raindrop image closer to the real sample. Image global information can be extracted using vgg16 networks, pre-training vgg16 is done on the dataset beforehand. The perceptual loss L_PIs calculated as follows:

L_P(O,T)＝L_MSE(VGG(O),VGG(T))；

where VGG is a pre-trained CNN network, which can accomplish feature extraction for a given input image. O is the output image of the auto-encoder and T is the real image sample without raindrops. L is_MSERepresenting the mean square error function.

Performing feature extraction from an inner layer of the discrimination network by using a CNN network, simultaneously performing feature extraction on a raindrop removing image generated by the generation network, and combining the obtained feature map and the attention map to form a loss function of a local discriminator; guiding a discrimination network to pay important attention to a raindrop area in an image by using an attention map, and judging whether the input image is true or false by using a full connection layer at the last layer of the discrimination network;

the overall loss function of the discrimination network is as follows:

L_D(O,R,A_N)＝-log(D(R))-log(1-D(O))+γL_map(O,R,A_N)；

where γ is 0.05 and the first two terms are the loss function of the global arbiter, L_mapRepresenting the loss function of the local discriminator, O being the output image of the automatic encoder, S representing the image features extracted from the encoder, A_NRepresenting a real image having the same scale as S, R being a sample image extracted from a real sharp image library;

loss function L of local arbiter_map(O,R,A_N) The following were used:

L_map(O,R,A_N)＝L_MSE(D_map(O),A_N)+L_MSE(D_map(R),0)；

in the formula, D_mapIndicating the generation of a two-dimensional attention mask map function by a discriminant network, 0 indicating that only a value of 0 is includedAttention is drawn, i.e. there are no raindrops in the real image and therefore there is no need to pay attention to the network for feature extraction either.

Furthermore, the discrimination network comprises 9 convolutional layers and a fully-connected layer, the inner core of each convolutional layer is (3,3), the fully-connected layer is 1024, and a single neuron adopts a Sigmoid activation function.

The invention also provides a system suitable for removing raindrops from an image of an electric power equipment, which comprises a memory, a processor and computer program instructions stored on the memory and capable of being executed by the processor, wherein when the computer instructions are executed by the processor, the method steps are realized.

Compared with the prior art, the invention has the following beneficial effects: the raindrop image generation method is based on the raindrop image generation principle, the raindrop image generation model is fused into the generation countermeasure network, the global and local discriminators are introduced into the discrimination network to monitor generation of the raindrop image, raindrop in the image is removed, the background image of the raindrop coverage area is repaired, and the purpose that the image is closer to a real image while raindrop is removed is achieved.

Drawings

Fig. 1 is a flow chart of training a generative confrontation network according to an embodiment of the present invention.

Fig. 2 is a structure of a raindrop removing network according to an embodiment of the present invention.

Fig. 3 is a block diagram of a context auto-encoder according to an embodiment of the present invention.

FIG. 4 is an original image of an embodiment of the present invention.

Fig. 5 is a raindrop attention diagram 1 according to an embodiment of the present invention.

Fig. 6 is a raindrop attention diagram 2 according to an embodiment of the present invention.

Fig. 7 is a raindrop attention diagram 3 according to an embodiment of the present invention.

Fig. 8 is a raindrop attention diagram 4 according to an embodiment of the present invention.

Fig. 9 is a final raindrop image according to an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The embodiment provides a method suitable for removing raindrops from a patrol image of an electric power equipment, which specifically comprises the following steps:

in the training stage, the generation network generates raindrop images meeting the conditions under the guidance of attention diagram by utilizing the mutual game between the generation network and the discrimination network; the specific training process is as shown in fig. 1, parameters of the generation network and the judgment network are updated in sequence in each training, and the training is stopped when a preset training frequency is reached;

The overall network architecture of the present embodiment is shown in fig. 2, and the overall loss function for generating the countermeasure network is:

wherein G represents a generation network, D represents a discrimination network, I is an image containing raindrops, R is a real sample containing no raindrops, G (I) is an image from which raindrops are removed, E_{R Pclean}Represented in an image without raindropsAmount of loss of image detail, E_{I Praindrop}Indicating the amount of image detail lost in the raindrop-containing image.

The input of the generation network is an image pair with the same background scene, which comprises an image containing raindrops (as shown in FIG. 4) and an image without raindrops, and the image is output as a raindrop removing image; the generating network comprises an attention cycle network and a context automatic encoder;

the attention loop network comprises more than one loop network, an attention diagram is generated in a loop iteration mode, the attention diagram contains the position information of the raindrops in the raindrop image, and the context automatic encoder is guided to focus on the raindrops and the surrounding area, as shown in fig. 5;

wherein, the LSTM unit comprises a forgetting door f_tAn input gate i_tAn output gate o_tAnd a status feature c indicating that it is to be passed to the next LSTM unit_t. The interaction between states and gates in the time dimension is defined as follows:

f_t＝σ(W_f·[h_t-1,x_t]+b_f)

i_t＝σ(W_i·[h_t-1,x_t]+b_i)

c_t＝σ(W_c·[h_t-1,x_t]+b_c)

o_t＝σ(W_o·[h_t-1,x_t]+b_o)

h_t＝o_t·tanh(c_t)

wherein x_tRepresenting the state characteristic of the LSTM cell at time t, h_tOutput unit representing time t, h_t-1Indicating time t-1Output characteristics of the LSTM unit; w_f、W_i、W_c、W_oRespectively representing the states of a forgetting gate, an input gate, an output gate and an input unit weight matrix; σ represents a Sigmoid activation function, and tanh represents a tanh function; b_f、b_i、b_c、b_oRespectively expressed as a forgetting gate, an input gate, an output gate and an input unit state bias item.

Loss function L per loop module_ATT({ A }, M) is the mean square error between the output attention map A and the binary mask M, and the loss function of the front loop module of the attention loop network is endowed with a smaller weight, and the loss function of the rear loop module is endowed with a larger weight. The formula is as follows:

In this embodiment, the value of θ is 0.8, and the value of N is 4, and experiments prove that the efficiency of the network is the highest at this time, and the obtained attention map is shown in fig. 6.

In the embodiment, the input of the context automatic encoder is an attention map containing a raindrop image and an attention cycle network output, and image raindrop removal and background restoration are realized under the guidance of the attention map;

In this embodiment, the context autocoder uses two loss functions, namely, multi-scale loss and perceptual loss;

in the formula, S_iRepresenting image features extracted from the i-th layer in the encoder,

is represented by having a sum of S_iTrue images of the same scale, λ_iThe weight of the ith layer is represented,

is the weight at different scales, M is the total number of layers of the encoder, L_MSERepresenting a mean square error function;

perceptual loss is added in addition to pixel-point based scale loss for obtaining global differences between the context autocoder output and the corresponding sharp picture. The perceptual loss measures the difference between the raindrop image and the real image from the global perspective of the image, which will make the raindrop image closer to the real sample. Image global information can be extracted using vgg16 networks, pre-training vgg16 is done on the dataset beforehand. The perceptual loss L_PIs calculated as follows:

L_P(O,T)＝L_MSE(VGG(O),VGG(T))；

The network design focuses more on feature extraction on large size, and the layer with smaller size contains less information and has small influence on model optimization. The output picture sizes of the last layer, the third last layer and the fifth last layer of the decoder are 1/4, 1/2 and 1 of the original size respectively, the corresponding weights λ are set to 0.6, 0.8 and 1.0, and the result of the final attention map is shown in fig. 7.

In this embodiment, the role of the discriminant network is to discriminate between true and false samples, and to discriminate between the image output by the generator and the true samples. The image true and false judgment only by using the global information is not beneficial to the recovery of the local information of the image by the generation network. It is desirable for the image to recover the details of the image as much as possible for subsequent target detection, so that the existing discrimination network cannot be used directly. The global discriminator and the local discriminator are combined together to discriminate whether the network output sample is generated, but it is necessary to know the position information of the raindrop in the image when the local discriminator is used. An attention diagram is generated in the attention cycle network in the image restoration stage, and therefore the problem of positioning of raindrop positions in the image is solved. The present embodiment therefore contemplates the introduction of an attention map into the network of discriminators that directs the local discriminators to automatically find raindrop regions in the image.

The construction of the discrimination network of the embodiment is specifically as follows:

the overall loss function of the discrimination network is as follows:

L_D(O,R,A_N)＝-log(D(R))-log(1-D(O))+γL_map(O,R,A_N)；

where γ is 0.05 and the first two terms are the loss function of the global arbiter, L_mapRepresenting the loss function of the local discriminator, O being the output image of the automatic encoder, S representing the image features extracted from the encoder, A_NRepresenting a real image having the same dimensions as S, R being a sample extracted from a library of real sharp imagesThe image;

loss function L of local arbiter_map(O,R,A_N) The following were used:

L_map(O,R,A_N)＝L_MSE(D_map(O),A_N)+L_MSE(D_map(R),0)；

in the formula, D_mapIndicating that a two-dimensional attention mask map function is generated by the discrimination network, 0 indicates an attention map containing only a value of 0, i.e., there are no raindrops in the real image and therefore it is not necessary to direct the network to perform feature extraction, as shown in fig. 8.

In this embodiment, the discriminant network includes 9 convolutional layers and a fully-connected layer, where the kernel of the convolutional layer is (3,3), the fully-connected layer is 1024, and a single neuron uses a Sigmoid activation function.

The final raindrop-removed image of this embodiment is shown in fig. 9.

The embodiment also provides a system suitable for the electric power equipment machine to patrol the image and remove raindrops, which comprises a memory, a processor and computer program instructions stored on the memory and capable of being executed by the processor, wherein when the processor executes the computer instructions, the method steps as described above are realized.

In particular, for training the raindrop removal network proposed in this embodiment, a set of pairs of power device images is required, each pair of images including the same background scene, one image containing raindrops and the other without raindrops. In order to enable the method provided by the embodiment to be most suitable for removing raindrops from the image in the unmanned aerial vehicle line patrol scene, the actual scene of the power equipment is simulated as much as possible when the data set is manufactured. Two cameras that adopt to carry on the unmanned aerial vehicle when taking the picture use two identical glass: one piece sprayed water and the other piece kept clean. The water is sprayed on the glass plate to simulate the raindrop situation on the camera in rainy days, and the thickness of the glass plate is 3 mm. The distance between the glass and the camera is set to 2 to 5 cm to produce different raindrop images and minimize the reflection effect of the glass. The relative positions of the cameras and the glass lenses are respectively kept unchanged in the shooting process, and the background images shot by the two cameras are guaranteed to be the same. While ensuring that atmospheric conditions (e.g., sunlight, clouds, etc.) and background objects should be static during the acquisition of the image pair.

The embodiment assigns 2000 pairs of obtained pictures, wherein 1600 pairs are used as a model training set and 400 pairs are used as a model testing set. The hyper-parameters of the model were set with an initial learning rate set to 0.001, a batch size set to 16, and the number of iterations set to 40000. After 40000 times of iterative training, the test set is input into the model for verification, and it is found that the portability of the raindrop removal model based on the countermeasure generation network is good.

The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

Claims

1. A method for removing raindrops in an image patrol of an electric power equipment is characterized by comprising the following steps:

2. The method for raindrop removal of an image of an electric power equipment machine according to claim 1, wherein the overall loss function of the generation countermeasure network is as follows:

3. The method for removing raindrops on the patrol image of the power equipment machine according to claim 1, wherein the input of the generation network is an image pair with the same background scene, the image pair comprises an image containing raindrops and an image without raindrops, and the image pair is output as a raindrop removing image; the generating network comprises an attention cycle network and a context automatic encoder;

in the formula, A_tAttention-force diagram, ATT, representing the generation of an attention-cycle network at a time step t_tThe function represents the cyclic block at time step t, F_t-1Indicating presence of raindropsFusing the image and the last cycle module output attention diagram; n is the number of cyclic modules, theta represents a random number from 0 to 1, and L_MSERepresenting the mean square error function.

4. The method for removing raindrops on the patrol image of the power equipment machine according to claim 3, wherein the value of θ is 0.8.

5. The method for removing raindrops on the patrol image of the power equipment machine according to claim 3, wherein the value of N is 4.

6. The method for raindrop removal of the power equipment machine patrol image according to claim 3, wherein the input of the context automatic encoder is an attention map comprising a raindrop image and an attention cycle network output, and the image raindrop removal and the background restoration are realized under the guidance of the attention map;

7. The method for removing raindrops on patrol images of electric equipment machines according to claim 6, wherein the context automatic encoder adopts two loss functions, namely multi-scale loss and perceptual loss;

the multi-scale loss extracts image characteristic information from different layers of a decoder, the model is optimized by fully utilizing the image multi-level information to obtain a clear raindrop-removing image and a multi-scale loss function L_M({ S }, { A }) is as follows:

is represented by having a sum of S_iTrue images of the same scale, λ_iRepresents the weight of the ith layer, M is the total number of layers of the encoder, L_MSERepresenting the mean square error function.

The perception loss measures the difference between the raindrop removed image and the real image from the global angle of the image, so that the raindrop removed image is closer to the real sample; perceptual loss function L_PIs calculated as follows:

L_P(O,T)＝L_MSE(VGG(O),VGG(T))；

the VGG is a pre-trained CNN network and is used for completing feature extraction of a given input image; o is the output image of the context auto-encoder and T is the real image sample without raindrops; l is_MSERepresenting the mean square error function.

8. The method for removing raindrops on an image of an electric power equipment machine according to claim 1, wherein the discrimination network is specifically constructed as follows:

the overall loss function of the discrimination network is as follows:

L_D(O,R,A_N)＝-log(D(R))-log(1-D(O))+γL_map(O,R,A_N)；

where γ is 0.05 and the first two terms are the loss function of the global arbiter, L_mapRepresenting the loss function of the local discriminator, O being the output image of the context auto-encoder, A_NRepresenting a real image with the same scale as S, R being a sample image extracted from a real sharp image library, where S represents the image from the encoderExtracting image features;

loss function L of local arbiter_map(O,R,A_N) The following were used:

L_map(O,R,A_N)＝L_MSE(D_map(O),A_N)+L_MSE(D_map(R),0)；

in the formula, D_mapIndicating that a two-dimensional attention mask map function is generated by the discriminant network, 0 indicates an attention map containing only 0 values, i.e., there are no raindrops in the real image and therefore it is not necessary to direct the network to perform feature extraction.

9. The method for removing raindrops on an electric power equipment machine patrol image according to claim 8, wherein the discrimination network comprises 9 convolutional layers and a full connection layer, the inner core of each convolutional layer is (3,3), the full connection layer is 1024, and a single neuron adopts a Sigmoid activation function.

10. A system for image patrol to remove raindrops for an electrical equipment machine, comprising a memory, a processor and computer program instructions stored on the memory and executable by the processor, which when executed by the processor, implement the method steps of any one of claims 1 to 9.