CN111179262A

CN111179262A - Electric power inspection image hardware fitting detection method combined with shape attribute

Info

Publication number: CN111179262A
Application number: CN202010002183.6A
Authority: CN
Inventors: 尹子会; 赵冀宁; 范晓丹; 付炜平; 王东辉; 刘洪吉; 李延旭; 甄珍; 赵振兵
Original assignee: State Grid Corp of China SGCC; North China Electric Power University; Maintenance Branch of State Grid Hebei Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; North China Electric Power University; Maintenance Branch of State Grid Hebei Electric Power Co Ltd
Priority date: 2020-01-02
Filing date: 2020-01-02
Publication date: 2020-05-19
Anticipated expiration: 2040-01-02
Also published as: CN111179262B

Abstract

The invention relates to a method for detecting a power inspection image hardware fitting by combining shape attributes, which comprises the following steps of 1, extracting global characteristics: extracting global features by using an image classification network VGG-16; 2, acquiring a region of interest: inputting the extracted global features into a regional suggestion network; 3, respectively carrying out classification prediction and boundary box distribution prediction; 4 loss function calculation: calculating classification loss, and calculating KL divergence bounding box regression loss by combining shape attributes; and 5, training a model. According to the method, the KL divergence and the shape characteristics of the hardware targets of the same type are combined to constrain the regression loss function of the Faster R-CNN model, so that the problems that the detection frame of the hardware targets is not accurately positioned in a complex background and the structures of part of the hardware targets are not completely displayed in an image are solved.

Description

Electric power inspection image hardware fitting detection method combined with shape attribute

Technical Field

The invention belongs to the technical field of image analysis, and relates to a method for detecting a power inspection image hardware fitting by combining shape attributes.

Background

The reliability of the power transmission line is guaranteed, and the method is one of important contents for building an energy internet and a smart power grid. The hardware fitting is an extremely important metal part existing on a transmission line in large quantity, and plays a role in supporting, fixing, connecting and protecting bare conductors and conductors; meanwhile, the device is a fault-prone component, and the good state of the device ensures the safe operation of the power grid. The transmission line runs outdoors for a long time, and is influenced by various meteorological environments, particularly galloping and vibration, so that the phenomena of faults such as displacement, damage, deflection and the like of the hardware fitting are inevitably caused. The automatic detection of the hardware target in the power transmission line with high accuracy is the basis for judging the hardware fault in the later period, so the automatic detection of the hardware target in the power transmission line with high accuracy becomes more important.

The method of combining deep learning in the routing inspection of the power system circuit not only greatly reduces the number of operation and maintenance personnel required in the routing inspection, eliminates false detection and missing detection caused by subjective factors of people, but also improves the working efficiency, and more accurately and effectively evaluates the state of the power grid. Currently popular target detection frameworks, such as an R-CNN (Region-conditional neural Network) model, a Fast R-CNN model and the like, are disclosed in the patent with the application number of 201610906708.2, and are a method and a system for identifying electric widgets of unmanned aerial vehicle inspection images based on the Fast R-CNN.

The network prediction output is two kinds of determined information, one is target category information, and the other is target boundary box information. However, the target position determined by the detection model only through the target bounding box offset is often inaccurate, the target detection score is high, but the detection box position offset is large, and the position prediction information cannot be corrected continuously in a post-processing NMS (Non-maximum-prediction) stage.

In order to realize high-accuracy automatic detection of the hardware object of the power transmission line inspection image and enable regression prediction of the hardware object boundary frame by the model to be more accurate, a more effective boundary frame regression method needs to be explored.

Disclosure of Invention

The invention aims to provide a hardware detection method of a power inspection image in combination with shape attributes, which restrains a regression loss function of a Faster R-CNN model by combining KL divergence and shape characteristics of hardware targets of the same type, solves the problems that a detection frame of a hardware target is not accurately positioned in a complex background and partial hardware targets are incompletely displayed in an image, provides a basis for further diagnosing hardware target defects and provides a basic guarantee for safe operation of a power grid.

In order to achieve the above object, the technical solution of the present invention comprises the steps of,

(1) global feature extraction: extracting global features by using an image classification network VGG-16;

(2) acquiring a region of interest: inputting the extracted global features into a regional suggestion network;

(3) respectively carrying out classification prediction and boundary box distribution prediction;

(4) and (3) calculating a loss function: calculating classification loss, and calculating KL divergence bounding box regression loss by combining shape attributes;

(5) and (5) training a model.

Further, in the step (1), the convolutional neural network for extracting the target features is extracted by using an image classification network VGG-16, the network carries out prediction training on a data set ImageNet to generate initial network parameters, original images in the hardware data set are input into the network to generate a deep feature map, and convolutional layer activation functions corresponding to the convolutional neural network for extracting the target features are linear correction unit functions.

Further, in step (2), a suggested region corresponding to the input image is generated according to a preset anchor box value, and a non-maximum suppression method is used for screening.

Further, in the step (2), the anchor box with IoU value greater than 0.7 is defined as a positive sample, the anchor box with IoU value less than 0.3 is defined as a negative sample, and other samples are discarded; the method comprises the steps that the characteristic diagrams with different input sizes are generated into an interested area with a fixed size through a blocking pooling method by the RoI pooling, and then local characteristics of the interested area are input into full connection layers of a classification network and a regression network for classification and positioning.

Furthermore, in the step (3), the classification prediction method includes stretching the obtained local feature map of the region of interest, generating an output value by a full-connection network, and then generating prediction probabilities of each class through a softmax function, wherein the sum of the probabilities is 1, so as to determine the exact class of the target in the anchor box and obtain a corresponding classification score.

Further, in the step (3), the method for performing the distribution prediction of the bounding box includes that one-dimensional gaussian distribution is adopted to fit data, different coordinate values of the sample are assumed to be independent of each other, and a probability density function of the one-dimensional gaussian distribution is as follows:

wherein x represents the sample value, x_eThe expected values representing the distribution being used to estimate the targetCoordinate position, Φ represents the set of network parameters that generate each predicted value, and σ represents the standard deviation of the distribution used to estimate the uncertainty of the predicted position.

Meanwhile, the distribution of the target coordinates with the sigma value approaching 0 is used as the distribution of the true values of the target coordinates in the data set, and an expression of the distribution of the true values of the coordinates is obtained, and is shown in formula (2):

Q_D(x)＝δ(x-x_g) (2)

wherein x represents the sample value, x_gRepresenting the true value of the coordinate, D representing x_gδ represents a dirac function.

Further, in the step (4),

the classification loss is calculated as a function of the original model of the retention Faster R-CNN: the function is:

L_cls(p_i,p_i ^*)＝-log[p_i ^*p_i+(1-p_i ^*)(1-p_i)](3)

wherein p is_iIs the class prediction probability of the object in the ith anchor box, i ∈ [1,2000 []；p_i ^*Is the category label of the object in the ith anchor box.

Further, in step (4), in the calculation of KL divergence bounding box regression loss in combination with shape attributes:

the KL divergence is defined by the formula:

where x denotes the sample value and P and Q are two different probability distributions.

Further, in step (4), in the calculation of KL divergence bounding box regression loss in combination with the shape attribute, the bounding box regression loss is calculated as:

defining a bounding box regression loss function based on KL divergence as:

wherein x represents the sample value, P_Φ(x) Representing the probability distribution of the predicted coordinates, Q_D(x) Representing a probability distribution of real label coordinates;

the closer the predicted coordinate probability distribution and the real label coordinate probability distribution is, the better the KL divergence value between the coordinate predicted probability distribution and the real label distribution of the input sample is, i.e. the smaller the KL divergence value is, the better the distribution is; further simplification of formula (5) can result:

wherein x is_gRepresenting the true tag coordinates, x_eRepresenting predicted tag coordinates;

respectively for variable x in formula (6)_eand σ is subjected to partial differentiation, and a variable α is used instead of σ as an output according to equation (7), and the loss function after the substitution is equation (8).

α＝log(σ²) (7)

Wherein x is_gRepresenting the true tag coordinates, x_eRepresenting the predicted tag coordinates.

Covariance values are calculated for the width and height of the same class samples, and the covariance formula is defined as equation (9).

cov(w,h)＝E[w-E(w)]E[h-E[h]](9)

In equation (9), w represents a predicted width value of a sample during model propagation, h represents a predicted height value of a sample during model propagation, E represents a mathematical expectation function, cov represents a covariance function, and covariance may represent a linear correlation between random variables. Since equation (9) is a covariance value of a discrete random variable, it can be represented by calculating a sample mean, specifically, by equation (10).

In the formula (10)

And

respectively representing the mean width and mean height of the sample object, assuming that the height and width of the object belong to Euclidean space, and constraining the object by using L2 norm to obtain a shape constraint function as shown in formula (11):

wherein w represents a prediction width value of the sample in the model propagation process, h represents a prediction height value of the sample in the model propagation process,

represents the mean value of the width of the sample object,

representing a sample target height mean;

adding a shape constraint function on the basis of the boundary box regression loss function of the formula (8) to obtain a final model boundary box regression loss function:

wherein L is_regRepresents the bounding box regression loss function, L_klRepresenting a bounding box loss function based on the KL divergence, and iota representing a shape constraint function.

Further, in the step (5), the constructed model is used for training the data set of the electric transmission line inspection image hardware, wherein the proportion of the images of the training set to the images of the testing set is 8: 2, the initial learning rate is 0.001, and the iteration number is 30000; after a trained hardware target detection model is obtained, an image to be detected is input into a model file with the highest iteration number, and accurate position information and confidence information of a hardware target are directly obtained through forward propagation of the image, so that the method has the following positive effects:

the invention provides a hardware detection method for an electric power inspection image in combination with shape attributes, which combines KL divergence and shape characteristics of hardware targets of the same type to constrain regression loss functions of a boundary frame on the basis of a Faster R-CNN model so as to solve the problems that the detection frame of the hardware targets is not accurately positioned in a complex background and the structures of part of the hardware targets are not completely displayed in the image, obviously improve the detection precision, provide a basis for further diagnosing the defects of the hardware targets and provide a basic guarantee for the safe operation of a power grid.

Description of the drawings:

FIG. 1 is a block flow diagram of the method of the present invention.

FIG. 2 is a graph showing the results of detection obtained in example 2 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In order to better illustrate the invention, the following examples are given by way of further illustration.

Example 1

The Faster R-CNN model is the content of the prior art, and the invention improves the Faster R-CNN model and provides a more effective boundary box regression function.

As shown in fig. 1, the invention provides a power inspection image fitting detection method combined with shape attributes, which comprises the following steps,

(1) global feature extraction: and extracting global features of the hardware fitting images by using an image classification network VGG-16.

(2) Acquiring a region of interest: and inputting the extracted global features into the RPN network.

(3) And respectively carrying out classification prediction and bounding box distribution prediction.

(4) And (3) calculating a loss function: calculating classification loss, and calculating KL divergence bounding box regression loss by combining shape attributes; carrying out classification loss calculation after classification prediction; and performing KL divergence boundary frame regression loss calculation combined with the shape attribute after boundary frame distribution prediction, wherein the KL divergence boundary frame regression loss calculation combined with the shape attribute combines the hardware shape attribute and the KL divergence loss calculation, firstly, obtaining a boundary frame regression loss function based on the KL divergence, then, combining hardware attributes with the same type of hardware with fixed length-width ratio, changing the loss function again, adding the covariance and obtaining a final boundary frame return loss function.

(5) And (5) training a model.

The steps are explained as follows:

Specifically, in the step (1), the convolutional neural network for extracting the target feature is extracted by using an existing classical image classification network VGG-16, the network performs prediction training on a large public data set ImageNet to generate initial network parameters, an original image in the hardware data set is input into the network to generate a deep feature map, and a convolutional layer activation function corresponding to the convolutional neural network for extracting the target feature is a Linear correction Unit (RecU) function.

(2) Acquiring a region of interest: the extracted global features are input into a Region suggestion network (RPN).

Specifically, a suggested region corresponding to the input image is generated according to a preset anchor box value, and a non-maximum suppression method is used for screening.

Specifically, in step (2), an anchor box "with an IoU (interaction over Union) value greater than 0.7 is defined as a positive sample, an anchor box with an IoU value less than 0.3 is defined as a negative sample, and other samples are discarded;

the method comprises the steps that the characteristic graphs with different input sizes are divided into regions of interest with fixed sizes by the aid of the RoI pooling through a blocking pooling method, the fixed size is 7 x 7, and then local characteristics of the regions of interest are input into full-connection layers of a classification network and a regression network for classification and positioning.

In the step (3), the classification prediction method includes stretching the obtained local feature map of the region of interest, generating an output value by a full-connection network, and then generating prediction probabilities of each class through a softmax function, wherein the sum of the probabilities is 1, so as to judge the exact class of the target in the anchor box and obtain a corresponding classification score.

In the step (3), the method for performing the distribution prediction of the bounding box is that one-dimensional Gaussian distribution fitting data is adopted, different coordinate values of the sample are assumed to be independent, and the probability density function of the one-dimensional Gaussian distribution is as follows:

wherein x represents the sample value, x_eThe expected values representing the distribution are used to estimate the coordinate position of the target, Φ represents the set of network parameters that generate each predicted value, and σ represents the standard deviation of the distribution used to estimate the uncertainty of the predicted position.

Q_D(x)＝δ(x-x_g) (2)

wherein x represents the sample value, x_gRepresenting the true value of the coordinate, D representing the expression x_gδ represents a dirac function.

(4) And (3) calculating a loss function: and calculating classification loss and calculating KL divergence bounding box regression loss by combining shape attributes. Carrying out classification loss calculation after classification prediction; and performing KL divergence boundary frame regression loss calculation combined with the shape attribute after boundary frame distribution prediction, wherein the KL divergence boundary frame regression loss calculation combined with the shape attribute combines the hardware shape attribute and the KL divergence loss calculation, firstly, obtaining a boundary frame regression loss function based on the KL divergence, then, combining hardware attributes with the same type of hardware with fixed length-width ratio, changing the loss function again, adding the covariance and obtaining a final boundary frame return loss function.

In the step (4), the classification loss is calculated as a function form of maintaining an original model of the Faster R-CNN: the function is:

L_cls(p_i,p_i ^*)＝-log[p_i ^*p_i+(1-p_i ^*)(1-p_i)](3)

wherein p is_iIs the class prediction probability of the object in the ith anchor box, i ∈ [1,2000 []；p_iIs the class label of the target within the ith anchor box.

In the step (4), in the calculation of KL divergence bounding box regression loss combined with the shape attribute:

the KL divergence is defined by the formula:

In the step (4), in the calculation of the KL divergence bounding box regression loss in combination with the shape attribute, the bounding box loss is calculated as:

defining a bounding box regression loss function as:

wherein x represents the sample value, P_Φ(x) Representing the probability distribution of the predicted coordinates, Q_D(x) Representing the probability distribution of the coordinates of a real tag, said D_KLObtained from equation (4).

wherein x is_gRepresenting the true tag coordinates, x_eRepresenting the predicted tag coordinates. Respectively for variable x in formula (6)_eand σ is subjected to partial differentiation, and a variable α is used instead of σ as an output according to equation (7), and the loss function after the substitution is equation (8).

α＝log(σ²) (7)

Whether measured in terms of the L1 norm or the aforementioned KL divergence as a loss function, the correlation between the shapes of the hardware at different locations in the image is ignored when measuring the distribution of the model predicted bounding box locations and the distribution of the true bounding box locations. The position coordinates of a plurality of target samples obtained through introduced boundary frame distribution prediction consider the similarity of the similar hardware targets in the data set in the shape distribution, namely the high linear correlation of the length-width ratios of the similar hardware targets, and the covariance values are calculated for the widths and the heights of the same type samples according to the characteristic, wherein the covariance formula is defined as an expression (9).

cov(w,h)＝E[w-E(w)]E[h-E[h]](9)

In the formula (10)

And

in the formula (11), w represents the prediction width value of the sample in the model propagation process, h represents the prediction height value of the sample in the model propagation process,

represents the mean value of the width of the sample object,

representing the mean of the sample target heights.

Adding a shape constraint function on the basis of the boundary box regression loss function of the formula (8) to obtain a final model regression loss function:

wherein L is_regRepresenting a bounding box penalty function, L_klRepresenting a bounding box loss function based on the KL divergence, and iota representing a shape constraint function.

(5) And (5) training a model.

In the step (5), the constructed model is used for training the data set of the electric transmission line inspection image hardware, wherein the image proportion of the training set to the testing set is 8: 2, the initial learning rate is 0.001, and the iteration number is 30000; and after a trained hardware target detection model is obtained, inputting an image to be detected into a model file with the highest iteration number, and directly obtaining accurate position information and confidence information of the hardware target through forward propagation of the image.

And (6) testing the trained model by using the known data in step (6).

Example 2

The trained model was obtained according to the method of example 1, and the image was processed using a pre-trained fast R-CNN target detection model based on the shape loss function.

The method of example 1 comprises the steps of,

(5) and (5) training a model.

The basic parameter settings of the Faster R-CNN model are that the batch size is 64, the initial learning rate is 0.001, the maximum iteration number is 70000, and the backbone network Resnet-50. The server GPU model used for the experiment was GTX1080ti, implemented under the tensoflow deep learning framework. The results are shown in FIG. 2. It can be seen that the shockproof hammer prediction boundary frame and the marking boundary frame are mutually overlapped, so that no serious detection frame deformation condition occurs, excessive redundant information is not included in the detection frame, and the detection frame is attached to a real target.

The invention provides a hardware detection method of an electric power inspection image in combination with shape attributes, which is characterized in that shape features of different types of hardware targets in a data set are added into a loss function as constraints, so that the problems that a detection frame of the hardware targets is not accurately positioned in a complex background, and the structures of part of the hardware targets are incompletely displayed in an image are solved, the detection precision is obviously improved, a foundation is provided for further diagnosing the defects of the hardware targets, and a foundation guarantee is provided for safe operation of a power grid. The invention is practical and has certain reference significance for the scheme design of related problems.

Claims

1. A method for detecting an electric power inspection image hardware fitting combined with shape attributes is characterized by comprising the following steps: which comprises the following steps of,

(5) and (5) training a model.

2. The method for detecting the power inspection image fitting in combination with the shape attribute according to claim 1, wherein the method comprises the following steps: in the step (1), the convolutional neural network for extracting the target characteristics is extracted by using an image classification network VGG-16, the network carries out prediction training on a data set ImageNet to generate initial network parameters, original images in the hardware fitting data set are input into the network to generate a deep feature map, and convolutional layer activation functions corresponding to the convolutional neural network for extracting the target characteristics are linear correction unit functions.

3. The method for detecting the power inspection image fitting in combination with the shape attribute according to claim 1, wherein the method comprises the following steps: in the step (2), a suggested area corresponding to the input image is generated according to a preset anchor box value, and a non-maximum value suppression method is used for screening.

4. The method for detecting the power inspection image fitting in combination with the shape attribute according to claim 3, wherein the method comprises the following steps: in the step (2), the anchor box with the IoU value being more than 0.7 is defined as a positive sample, the anchor box with the IoU value being less than 0.3 is defined as a negative sample, and other samples are discarded; the method comprises the steps that the characteristic diagrams with different input sizes are generated into an interested area with a fixed size through a blocking pooling method by the RoI pooling, and then local characteristics of the interested area are input into full connection layers of a classification network and a regression network for classification and positioning.

5. The method for detecting the power inspection image fitting in combination with the shape attribute according to claim 3, wherein the method comprises the following steps: in the step (3), the classification prediction method includes stretching the obtained local feature map of the region of interest, generating an output value by a full-connection network, and then generating prediction probabilities of each class through a softmax function, wherein the sum of the probabilities is 1, so as to judge the exact class of the target in the anchor box and obtain a corresponding classification score.

6. The method for detecting the power inspection image fitting in combination with the shape attribute according to claim 3, wherein the method comprises the following steps: in the step (3), the method for performing the distribution prediction of the bounding box is that one-dimensional Gaussian distribution fitting data is adopted, different coordinate values of the sample are assumed to be independent, and the probability density function of the one-dimensional Gaussian distribution is as follows:

wherein x represents the sample value, x_eExpressing the expected values of the distribution to be used for estimating the coordinate position of the target, phi expressing a network parameter set for generating each predicted value, and sigma expressing the standard deviation of the distribution to be used for estimating the uncertainty of the predicted position;

Q_D(x)＝δ(x-x_g) (2)

7. The method for detecting the power inspection image fitting in combination with the shape attribute according to claim 1, wherein the method comprises the following steps: in the step (4), the step (c),

L_cls(p_i,p_i ^*)＝-log[p_i ^*p_i+(1-p_i ^*)(1-p_i)](3)

8. The method for detecting the power inspection image fitting in combination with the shape attribute according to claim 1, wherein the method comprises the following steps: in the step (4), in the calculation of KL divergence bounding box regression loss combined with the shape attribute:

the KL divergence is defined by the formula:

9. The method for detecting the power inspection image fitting in combination with the shape attribute of claim 8, wherein: in the step (4), in the calculation of the KL divergence bounding box regression loss in combination with the shape attribute, the bounding box regression loss is calculated as:

defining a bounding box regression loss function based on KL divergence as:

respectively for variable x in formula (6)_eand partial differentiation is performed on σ, the variable α is used to replace σ as an output according to equation (7), and the loss function after replacement is equation (8):

α＝log(σ²) (7)

cov(w,h)＝E[w-E(w)]E[h-E[h]](9)

In the formula (10)

And

the mean value of the width and the mean value of the height of the sample object are represented, respectively, assuming that the height and width of the object belong to euclidean space, constrained with the L2 norm,the shape constraint function is obtained as shown in equation (11):

represents the mean value of the width of the sample object,

representing a sample target height mean;

10. The method for detecting the power inspection image fitting in combination with the shape attribute according to claim 1, wherein the method comprises the following steps: in the step (5), the constructed model is used for training the data set of the electric transmission line inspection image hardware, wherein the image proportion of the training set to the testing set is 8: 2, the initial learning rate is 0.001, and the iteration number is 30000; and after a trained hardware target detection model is obtained, inputting an image to be detected into a model file with the highest iteration number, and directly obtaining accurate position information and confidence information of the hardware target through forward propagation of the image.