Method for detecting abnormal target of power transmission line
Technical Field
The invention belongs to the technical field of image data processing, and particularly relates to a method for detecting an abnormal target of a power transmission line.
Background
The transmission line is one of the most important infrastructures of the energy Internet in China, and the safety and stability of the running state of the transmission line are important preconditions of electric energy transmission. In recent years, the external environment of the power grid is gradually deteriorated, so that the safety of the power grid faces severe examination. At present, there are two main methods for checking foreign matters in a power transmission line: manual inspection and unmanned inspection. Under the conditions of variable climatic conditions and complex landforms in China, guaranteeing the operation of ultra-large-scale power networks in China is a difficult task. For the inspection workers, the danger coefficient is high, the labor intensity is high, and due to the fact that the quality of the workers is uneven, the missed inspection and the false inspection are sometimes happened, and the inspection efficiency is low. Patrolling personnel need carry professional equipment and go deep into high mountain canyon, the old forest in deep mountain even, climbs the iron tower of tens of meters height, carries out the electric wire netting and patrols and examines work, bumps into bad weather, still sticks to and patrols and examines an emergency, in the in-process of patrolling and examining, in case take place the high altitude accident such as fall or electrocute, the consequence is beyond the limit of thinking. Subsequently, unmanned aerial vehicle routing inspection has occurred. The unmanned aerial vehicle has comprehensive aerial flight view, and the visual field range of patrolling for the manual work is greatly enlarged to unmanned aerial vehicle's speed has been too many more than the manual work, and unmanned aerial vehicle can the full play wide, the mobility is good, ageing strong, the wide advantage of patrolling range of view. However, the large amount of image data transmitted back by the unmanned aerial vehicle still needs to be artificially judged whether foreign matters exist on the line, and therefore, the method is limited.
With the development of computer vision and deep learning, the abnormal target detection method of the power transmission line is developed at the same time, and even becomes a hotspot of research in the field of target detection. Chinese patent application publication No. CN109493337A discloses a method for detecting foreign matter in power transmission line based on improved fast RCNN, which improves the network structure of fast RCNN, changes the size of convolution kernel, the number of neurons and the hyper-parameters in the network, adjusts the size ratio of anchors (anchor frames) in fast RCNN, improves the original training mode of fast RCNN, and adopts an end-to-end joint training mode to preprocess the collected foreign matter images and put them into the trained network for detecting foreign matter. The foreign object detection method proposed in the patent application has the following disadvantages: (1) the method comprises the steps of utilizing a shared convolutional neural network to carry out feature extraction on an input image, directly sending the extracted features into an RPN (region generation network), and not combining the features of feature maps with different resolutions, so that the features are not fully utilized; (2) the conventional RPN generates a large number of anchors (anchor boxes), and most of the anchors exist in the background area, and have no positive effect on the improvement of the detection performance.
Besides adopting a fast RCNN method to realize target detection, a Mask RCNN method can also be adopted to detect targets. If Mask RCNN is adopted to detect foreign matters of the power transmission line, although features extracted by the ResNet network can be fused by using a feature pyramid, the feature pyramid in the Mask RCNN only considers feature fusion of adjacent resolutions, and non-adjacent features cannot be fully utilized. Therefore, the accuracy of the detection of the abnormal target of the existing power transmission line is not high.
The above information disclosed in this background section is only for enhancement of understanding of the background of the application and therefore it may comprise prior art that does not constitute known to a person of ordinary skill in the art.
Disclosure of Invention
The invention aims to provide a method for detecting an abnormal target of a power transmission line so as to improve the detection accuracy.
In order to realize the purpose of the invention, the invention is realized by adopting the following technical scheme:
a method for detecting an abnormal target of a power transmission line is characterized by comprising the following steps:
acquiring an original power transmission line image;
extracting the characteristics of the original power transmission line image by using a residual error network and a characteristic pyramid to obtain a multi-scale characteristic pyramid image set;
fusing the multi-scale characteristic pyramid image set to obtain a fused characteristic image;
inputting the fused feature map into a non-local network, performing feature enhancement, and generating an enhanced feature pyramid map set;
inputting each feature map in the enhanced feature pyramid map set into a feature guide anchor generation mechanism, respectively predicting the position and the shape of the anchor, combining the position and the shape of the anchor to generate a target anchor, and performing feature adaptive adjustment on the enhanced feature pyramid map set by using the target anchor to generate a modified feature pyramid map set;
and detecting the corrected characteristic pyramid image set by using a detector, correcting a detection result by using a loss function, and outputting a final target detection result.
Compared with the prior art, the invention has the advantages and positive effects that: according to the method for detecting the abnormal target of the power transmission line, the residual network and the characteristic pyramid are used for extracting the characteristics of the original power transmission line image to obtain the multi-scale characteristic pyramid image set, then the non-local network is used for carrying out characteristic enhancement on the characteristic image after the multi-scale characteristic pyramid image set is subjected to fusion processing, the denoising effect is achieved, the resolution ratio of the characteristic image is improved, the characteristic image is enabled to have higher resolution, the problems that the image of the abnormal target data set of the power transmission line is not clear and the characteristics are not accurately extracted are effectively solved, and the method is beneficial to improving the detection accuracy of the abnormal target of the power transmission line; the method has the advantages that a feature guidance anchor generation mechanism is adopted, the position and the shape of the anchor are predicted based on the feature graph of the enhanced feature pyramid graph set, the target anchor is generated, and feature adaptive adjustment is carried out on the feature graph based on the target anchor, so that the generation efficiency of the anchor can be improved, sparse anchors with any shapes can be generated, the quality of an proposed area is improved, classification and regression are facilitated, and the accuracy of detection of the abnormal target of the power transmission line is further improved.
Other features and advantages of the present invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of an embodiment of a method for detecting an abnormal target of a power transmission line according to the present invention;
FIG. 2 is a partial original transmission line image;
FIG. 3 is an output image of the original power transmission line image of FIG. 2 after the method of FIG. 1;
FIG. 4 is a schematic diagram of a structure for constructing a feature pyramid;
FIG. 5 is a schematic diagram of a non-local network based feature enhancement architecture;
FIG. 6 is a schematic diagram of a feature-guided Anchor generation mechanism.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and examples.
Referring to fig. 1, a flowchart of an embodiment of a method for detecting an abnormal target of a power transmission line according to the present invention is shown. Specifically, the method is a flowchart of an embodiment of a method for detecting an abnormal target of a power transmission line by using a feature enhancement and feature-guided anchor generation mechanism.
As shown in fig. 1, the embodiment adopts the following method to detect the abnormal target of the power transmission line, so as to improve the detection accuracy.
Step 11: and acquiring an original power transmission line image.
Specifically, pictures acquired by video monitoring of the power transmission line can be used as original power transmission line images. A partial original transmission line image is shown in fig. 2.
Step 12: and extracting the characteristics of the original power transmission line image by using the residual error network and the characteristic pyramid to obtain a multi-scale characteristic pyramid image set. And then, fusing the multi-scale characteristic pyramid image set to obtain a fused characteristic image.
In the step, the original power transmission line image is used as input and input into a residual error network, and a plurality of levels of feature maps with proportional sizes are output in a full convolution mode. In a preferred embodiment, the residual network is a ResNet-101 network, and the specific structure of the ResNet-101 network is shown in the following table (one).
Resnet-101
|
Input device
|
Output of
|
Output dimension
|
Convolution kernel size
|
Conv1
|
224x224
|
112*112
|
64
|
7*7,64
|
Conv2_x
|
112*112
|
56*56
|
64
|
3*3,64
|
Conv3_x
|
56*56
|
28*28
|
128
|
3*3,128
|
Conv4_x
|
28*28
|
14*14
|
256
|
3*3,256
|
Conv5_x
|
14*14
|
7*7
|
512
|
3*3,512 |
For ResNet-101, five modules are divided: conv1, conv2_ x, conv3_ x, conv4_ x, conv5_ x. The outputs of the last layer of the conv2_ x, conv3_ x, conv4_ x, and conv5_ x four modules are denoted as { C2, C3, C4, C5}, in which conv1 is not included due to the large memory footprint.
Then, the characteristic graph output by the ResNet-101 network is connected through a path from bottom to top, a path from top to bottom and a transverse direction to generate a multi-scale initial characteristic pyramid graph set. A schematic diagram of the structure for constructing the feature pyramid is shown in fig. 4. In specific implementation, the high-level feature map is fused by 2 times (for simplicity, nearest neighbor upsampling is used) with the feature map from the bottom to the top on the left side by simple addition element by element. Meanwhile, the left side top down feature map uses 1 × 1 convolutional layers to reduce the channel dimension. Furthermore, to begin the iteration, a 1 × 1 convolutional layer is appended to C5 to produce the coarsest resolution feature map. This process is iterated to generate a feature map set { P2, P3, P4, P5}, as an initial feature pyramid set, corresponding to { C2, C3, C4, C5} sizes, respectively.
The multi-scale initial feature pyramid sets are then pooled and differenced to the same size, such as to a size of P2. Then, the average value is added and taken to obtain a fused feature map. Because the feature maps of different scales contain different information, the information between the feature maps of different scales can be balanced by adding and averaging, so that the information is more representative.
Step 13: and inputting the fused feature graph into a non-local network, performing feature enhancement, and generating an enhanced feature pyramid graph set.
Fig. 5 shows a schematic structural diagram of feature enhancement performed based on the non-local network, in which the fused feature map obtained in step 12 is input into the non-local network, and the feature enhancement is performed on the fused feature map through the non-local network, so as to obtain an enhanced feature map. And then, pooling and difference processing are carried out on the enhanced feature map, and finally, an enhanced feature pyramid map set { T2, T3, T4 and T5} is obtained.
The significance of the step is that for the original power transmission line image, a plurality of pictures are not clear, the quality of feature extraction is affected, and the extracted features are further enhanced by adopting a non-local network, so that noise can be removed, and the extracted features are clearer.
Wherein the non-local network is implemented by non-local modeling. The non-local network is formed through non-local modeling, long-distance dependency relationship can be captured, the problem that the global situation cannot be seen through local operation can be well solved through introducing global information in certain layers, and richer information is brought to subsequent layers. The non-local modeling formula is:
in the formula, c (x) is used as a normalization factor for normalization to obtain a final response value.
C (x) is set as:
the above formulaIn, yiA function value representing the relation between the current point and the global information, x is the input signal, i is the index of the output position, j represents each position on the image, a function f is used for calculating the similarity relation between the input signal x at the position i and the position j, and a function g is used for calculating the characteristic value of the input signal x at the position j
For ease of processing, the function g is considered to be a linear transformation, g (x)
j)=W
gx
j. Wherein Wg is weight and is determined through training and learning. The function f is preferably an embedded gaussian function,
θ(x
i)=W
θx
i,φ(x
j)=W
φx
j;W
θand W phi are weights and are determined through training and learning.
Step 14: inputting each feature map in the enhanced feature pyramid map set into a feature guide anchor generation mechanism, respectively predicting the position and the shape of the anchor, generating a target anchor, and then performing feature adaptive adjustment on the enhanced feature pyramid map set by using the target anchor to generate a modified feature pyramid map set.
And applying an anchor generation mechanism to the feature map of each resolution on the basis of the enhanced feature pyramid map set { T2, T3, T4 and T5} obtained in the step 13. The anchor generation mechanism adopts two branches, the position and the shape of the anchor are respectively predicted, and the two branches are combined together to generate the target anchor. Fig. 6 shows a schematic structural diagram of a feature-guided anchor generation mechanism.
For a location-predicted branch that predicts the location of an anchor, the goal is to predict which regions should be the center points to generate the anchor, a binary problem. Also, unlike conventional RPNs, here it is not predicted whether each point is foreground or background, but whether the point is the center of an object. Dividing the area of the whole feature map into an object center area, a peripheral area and an neglected area, wherein the basic idea is that a small area of the center of a real frame (ground route) corresponding to the area on the feature map is marked as the object center area and is used as a positive sample during training; the remaining regions are labeled as ignore or negative samples by distance from the center. By the position prediction branch, a small part of area can be screened out to be used as the position of the candidate center point of the anchor, so that the number of the anchors is greatly reduced.
The specific implementation method comprises the following steps: applying an N to each feature map in the enhanced set of feature pyramidsLThe network generates a probability graph, and the probability of each point on the graph represents the probability that the point is the center of the target object; and setting a probability threshold, screening out points with the probability greater than the probability threshold, and predicting the points as the center position of the anchor, or selecting the points as the possible positions of the anchor.
Wherein N isLThe network comprises a 1 × 1 convolution operation and a sigmoid function, the 1 × 1 convolution operation is used for generating a mapping with the same size for the feature map, and the output positions (m, n) of the mapping correspond to ((m +0.5) s, (n +0.5) s) on the original feature map, wherein the distance s between adjacent anchor points is the step size of the original feature map, and the sigmoid function is used for generating the probability map.
For shape predicted branches that predict the shape of an anchor, the goal is to predict the optimal length and width given the anchor center point, which is a regression problem. The specific implementation method comprises the following steps:
applying an N to each feature map in the enhanced set of feature pyramidsSNetwork of NSThe network includes a 1 × 1 convolution operation that generates a map of two channels of equal size, representing width w and height h, respectively, using IoU (cross-over ratio) as a monitor to learn w and h.
IoU for the predicted anchor and real box are expressed as:
wherein, awh={(x0,y0,w,h)|w>0,h>0},x0、y0Determining the predicted coordinates of the anchor according to the predicted position; w, h are the predicted width and height of the anchor; gt ═ xg,yg,wg,hg),xg、ygAs coordinates of the center of the real frame, wg、hgThe width and height of the real box.
It should be noted that some possible w, h are sampled by approximation since it is not possible to go through all possible w, h and then find IoU maximum.
The target anchors generated by the anchor generation mechanism have unique shapes and sizes, and the matching degree with the features is not high enough. In the preferred embodiment, feature adaptive adjustment is also performed using the generated target anchor, and a corrected feature pyramid image set is generated. Specifically, the shape information of the anchor is directly fused into the enhanced characteristic pyramid characteristic diagram to generate a new characteristic diagram, and the new characteristic diagram can adapt to the shape of the anchor at each position. The specific implementation method comprises the following steps:
each feature map in the enhanced feature pyramid set is modified using a 3 x 3 deconvolution to generate a modified feature pyramid set.
The offset of deconvolution is obtained by performing 1 × 1 convolution operation on the width and height of the target anchor.
Step 15: and detecting the corrected characteristic pyramid image set by using a detector, correcting the detection result by using a loss function, and outputting a final target detection result.
Wherein the detector is trained through a training process. And detecting the corrected characteristic pyramid image set by using a detector, continuously correcting the detection result output each time by using a loss function to enable the detection result to be closer to a true value, and finally outputting a final target detection result.
In the object detection method of this embodiment, in addition to the basic classification loss and the regression loss, the position and shape of the anchor need to be learned, and therefore, two additional loss functions L are requiredloc、Lshape. Thus, the expression of the final penalty function L is determined as:
L=λ1Lloc+λ2Lshape+Lmask+Lreg+Lcls。
wherein the content of the first and second substances,λ1、λ2all are coefficients greater than 0. In some preferred embodiments, λ1=1,λ2=0.1。
L
loc=-(1-p
t)
γlog(p
t) And guiding the loss function corresponding to the predicted anchor position in the anchor generation mechanism for the characteristics. Wherein the content of the first and second substances,
y∈[0,1]a category representing a real frame, which represents a negative class when y is 0 and a positive class when y is 1; p represents the probability that the point is predicted to be the center of the location; gamma is a modulation factor greater than 0 that is used to adjust the rate of simple sample weight reduction. Further, the experiment showed that γ was 2, which is the best effect.
And guiding the loss function corresponding to the shape prediction of the anchor in the anchor generation mechanism for the characteristics. w, h are the predicted width and height of the anchor, w
g、h
gThe width and height of the real box. L is
1To smooth the loss function smooth
L1. In some preferred embodiments, smooth
L1The expression of (a) is:
where the parameter x is the difference between the predicted result and the true value.
L
maskAnd L
clsRespectively representing the loss function corresponding to the mask branch and the classified loss function, wherein the two loss functions are cross entropy losses and have the formula of
p
kIs the probability of predicting the kth anchor as positive,
is a label for distinguishing the kth anchor as a positive class or a negative class, the positive class is 1, and the negative class is 0.
LregAlso smooth loss function smoothL1。
The original transmission line image of fig. 2 is detected by the method of the embodiment of fig. 1, and the obtained output result image is shown in fig. 3.
The method of the embodiment and the preferred embodiment is adopted to realize the detection of the abnormal target of the power transmission line, the residual network and the characteristic pyramid are utilized to extract the characteristics of the original power transmission line image to obtain the multi-scale characteristic pyramid image set, and then the non-local network is utilized to perform characteristic enhancement on the characteristic image after the fusion processing of the multi-scale characteristic pyramid image set, so that the denoising effect is achieved, the resolution of the characteristic image is improved, the characteristic image has higher resolution, the problems of unclear image and inaccurate characteristic extraction of the abnormal target data set of the power transmission line are effectively solved, and the detection accuracy of the abnormal target of the power transmission line is favorably improved. Moreover, a feature guidance anchor generation mechanism is adopted, the position and the shape of the anchor are predicted based on the feature graph of the enhanced feature pyramid graph set, the target anchor is generated, and feature adaptive adjustment is carried out on the feature graph based on the target anchor, so that the generation efficiency of the anchor can be improved, sparse anchors with any shapes can be generated, the quality of an proposed area is improved, classification and regression are facilitated, and the accuracy of detection of the abnormal target of the power transmission line is further improved.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions.