CN108647655B

CN108647655B - Low-altitude aerial image power line foreign matter detection method based on light convolutional neural network

Info

Publication number: CN108647655B
Application number: CN201810465955.2A
Authority: CN
Inventors: 张菁; 王立元; 卓力; 梁西; 李昱钊
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2018-05-16
Filing date: 2018-05-16
Publication date: 2022-07-12
Anticipated expiration: 2038-05-16
Also published as: CN108647655A

Abstract

A low-altitude aerial image power line foreign matter detection method based on a light convolutional neural network belongs to the field of computer vision, and researches a real-time detection method for power line foreign matters in aerial images of unmanned aerial vehicles. Firstly, a light power line detection model is constructed by utilizing a convolutional neural network, and the depth characteristics of power lines in aerial images are obtained through calculation; then, a multi-target power line foreign matter detection model is built by utilizing a convolutional neural network, convolutional layers with different lengths and widths are used, and the predicted value of the multi-scale target is calculated by utilizing the depth characteristic; and finally, filtering the video frames without the power lines by using a power line detection model, and realizing real-time detection of the foreign matters in the power lines in the low-altitude aerial images by using a multi-target power line foreign matter detection model on the video with the detected power lines.

Description

Low-altitude aerial image power line foreign matter detection method based on light convolutional neural network

Technical Field

The invention discloses a real-time detection method for power line foreign matters in aerial images of an unmanned aerial vehicle based on a deep learning technology. Firstly, a light power line detection model is constructed by utilizing a convolutional neural network, and the depth characteristics of power lines in aerial images are obtained through calculation; then, a multi-target power line foreign matter detection model is built by utilizing a convolutional neural network, convolutional layers with different lengths and widths are used, and the predicted value of the multi-scale target is calculated by utilizing the depth characteristic; and finally, filtering the video frames without the power lines by using a power line detection model, and realizing real-time detection of the foreign matters in the power lines in the low-altitude aerial images by using a multi-target power line foreign matter detection model on the video with the detected power lines. The invention belongs to the field of computer vision, and particularly relates to technologies such as deep learning and target detection.

Background

With the development of information technology, high-performance aerial photography sensors are widely applied to aerial photography. And the unmanned aerial vehicle technique matures day by day, more makes the low-altitude technique of taking photo by plane obtain very big development, has become a neotype wide practical technique of prospect. The method has the advantages that the low-altitude aerial image data show mass growth, show the characteristics of multiple angles, complex background and the like, and have important research significance and application value for realizing real-time and efficient processing of the low-altitude aerial image data. The method has important application in natural disaster assessment, transportation, urban planning and other aspects. Due to the efficient and safe characteristics of the unmanned aerial vehicle, power line inspection in the power system also becomes one of the important application fields.

The power line is an important national infrastructure and bears important responsibilities of power transportation, and the power line inspection is an important guarantee for ensuring the stable operation of a power system. With the high-speed development of electric power systems in China, the maturity of technologies such as long-distance high-voltage circuits, ultra-high voltage transmission and the like, electric power transportation has the characteristics of large transmission capacity and long transmission distance, and more power lines extend to mountains, fields and other complex geographic environments from cities. The traditional manual inspection mode is time-consuming and labor-consuming, is influenced by natural environment and climate, and restricts the construction of electric power systems in China. Unmanned aerial vehicle power line is patrolled and examined has high efficiency, and is safe, does not receive the characteristics that weather, topography influence, has become the important mode that the power line patrolled and examined. In the unmanned aerial vehicle power line inspection process, the digital camera that usable unmanned aerial vehicle connects to carry shoots low latitude power line image, has contained the basic condition of power line in these images. Through the processing of the low-altitude aerial power line images, the abnormal state of the power line can be found in time, and therefore rapid processing is carried out.

The image processing technology comprises image compression, segmentation, enhancement, description, identification and the like, and the target identification is one of important applications of the image processing technology. The traditional target identification technology is based on artificial features, and is difficult to process various targets under complex backgrounds and mass data. In recent years, a depth learning technology, which is a latest technology in the field of artificial intelligence, shows excellent performance in a target detection problem, for example, a depth target detection framework represented by ssd (single Shot multi box detector) and the like, and efficiency is further improved while high-precision identification is performed.

The invention provides an aerial image power line foreign matter detection method based on a multi-scale convolutional neural network, and aims to solve the problems that power line polling images are increasing day by day, the effect of a traditional target identification method is limited and the like. Firstly, a lightweight powerline detection model based on a Convolutional Neural Network (CNN) is constructed, and a multilevel depth feature of the powerline image is learned on a pre-training data set. And then, a power line foreign matter detection model based on the convolutional neural network is constructed, and the targets with different scales are processed by utilizing convolutional layers with different lengths and widths to obtain the predicted value of the multi-scale target. And then filtering irrelevant images without the power lines by using a power line detection module, and combining the multi-scale target predicted values. And finally, a multi-scale target prediction value is utilized, a non-maximum suppression (non maximum suppression) algorithm is used, a frame with high confidence coefficient is reserved, and the detection of the abnormal target of the power line is realized.

Disclosure of Invention

The invention provides a real-time power line foreign matter detection method based on a light convolutional neural network by utilizing a deep learning technology, which is different from the existing power line foreign matter detection method. Firstly, a light power line detection model is constructed by utilizing a Convolutional Neural Network (CNN), aiming at a single target of a power line, the light power line detection model is adopted, the number of layers is small, the requirement of single target detection can be met, and the training and detection time is effectively reduced. And (4) pre-training a network on the self-built power line image data set, and extracting the power line depth characteristic. Secondly, a convolutional neural network is utilized to train a power line foreign body detection model, convolutional layers with different lengths and widths are added into the model, and predicted values are calculated on a plurality of layers simultaneously. And then combining the outputs of different layers so as to learn the depth characteristics of the multi-scale target. The self-built power line foreign matter data set is used for pre-training, and a data augmentation method is adopted for random overturning, cutting and color changing, so that the data volume is expanded, and the generalization capability is further improved. And finally, in the stage of detecting the foreign matters in the power line, firstly removing irrelevant frames in the video by using a power line detection model, reserving key frames containing the power lines, extracting predicted values of power line boundary frames in the key frames, then detecting the key frames by using the power line foreign matter detection model, obtaining predicted values of all targets, filtering more similar boundary frames by using a non-extreme value inhibition algorithm, reserving boundary frames with higher confidence coefficient, and then realizing the quick and accurate detection of the foreign matters in the aerial power line image by using the obtained power line boundary frames and the foreign matter target boundary frames. The main process of the method is shown as attached figure 1 and can be divided into the following three steps: the method comprises the steps of power line foreign object target detection model construction based on a convolutional neural network, neural network pre-training and power line abnormal target detection.

(1) Power line detection model construction based on light convolutional neural network

The research object of the invention is aerial images, and in order to effectively remove irrelevant frames in videos, a power line detection model based on a light convolutional neural network is firstly constructed, the model network has a simple structure and fewer layers, and the detection real-time performance is ensured on the basis of effectively extracting the characteristics of power lines. Aiming at two categories of foreign matters on the power line, namely kites and balloons, a power line foreign matter detection model based on a light convolutional neural network is constructed, the two models detect step by step, and the detection precision is improved on the basis of improving the real-time performance.

(2) Neural network pre-training

For the power line detection model, a power line Image Dataset (Powerline Image Dataset) is used for pre-training, for the power line foreign matter detection model, balloon and kite pictures collected by a user are used as source data, translation, cutting and color changing are carried out by using a data augmentation algorithm, and therefore the power line foreign matter detection model is expanded to 4000 pieces as a training Dataset. The data set comprises power lines and power line foreign object images with different scales, lighting conditions and shooting angles, and depth characteristics under different conditions can be effectively learned.

(3) Power line anomaly target detection

The invention provides a multiple power line target detection method. Firstly, a power line detection model is utilized to carry out frame-by-frame detection on aerial videos, and irrelevant frames without power line targets are discarded. And for the key frames with the power line targets, further detecting by using a power line foreign object detection model, and calculating to obtain power line parameter predicted values and foreign object target boundary frame predicted values so as to judge whether the power line foreign object targets exist.

Compared with the prior art, the invention has the following obvious advantages and beneficial effects:

firstly, compared with the traditional artificial characteristic power line target identification method, the method utilizes the advanced convolutional neural network to construct the light power line detection model and the power line foreign matter detection model, realizes irrelevant frame filtering of power line images, greatly improves the detection efficiency, and ensures the real-time performance of power line foreign matter detection by utilizing the light network. Experiments prove that irrelevant frames can be effectively filtered in the aerial images of the unmanned aerial vehicle by adopting the structure, and the detection efficiency is greatly improved. Meanwhile, a multi-scale convolution layer is added in the light power line foreign matter detection model, and foreign matter image features of different scales are learned, so that the method is suitable for the multi-scale situation caused by shooting different targets at different distances of the unmanned aerial vehicle.

And finally, calculating a power line parameter predicted value and a foreign object target boundary frame predicted value by using a light power line model and a power line foreign object model for the screened power line image so as to judge whether the power line foreign object target exists or not.

Experiments prove that the deep neural network based on VGG-16 utilizes the multi-scale convolutional layer for learning, 74.3% of mAP (mean average probability) can be realized on a VOC2007 database, and the detection speed of 59FPS is kept. Therefore, the method is transferred to the task of detecting the abnormal target of the power line, and the method is feasible and has important application value for realizing efficient, accurate and real-time power line inspection.

Description of the drawings:

FIG. 1 is a flow chart of a method for detecting foreign matters in aerial image power lines based on a light convolutional neural network

FIG. 2 architecture diagram of a light power line detection model

FIG. 3 power line foreign matter detection model architecture diagram

FIG. 4 a diagram of a process for detecting foreign objects on a power line

Detailed Description

Based on the above description, a specific implementation flow is as follows, but the scope of protection of this patent is not limited to this implementation flow.

Step 1: power line foreign object target detection model construction based on convolutional neural network

Step 1.1: power line detection model construction based on light convolutional neural network

The existing deep learning target detection model has a wide application scene, and often can detect thousands of objects, such as YOLO9000, and 9418 classes. In a power line inspection scene, the types of targets are extremely limited, mainly including three types of power lines, balloons and kites, in the model, only the power lines need to be identified, the characteristics are extremely limited, the existing deep learning target detection model is too redundant for the power line scene, the lightweight model is effective under the condition, the lightweight model can identify the limited types of targets, and meanwhile, the detection speed is improved.

The light convolutional neural network is realized based on a mainstream open source deep learning framework Caffe, and the specific structure diagram of the step is shown in the attached figure 2. The method comprises the steps of inputting power line aerial images, carrying out convolution through 6 convolution layers, enabling the convolution kernel size to be 3 x 3, enabling the first four convolution layers to be subjected to Batch Normalization (Batch Normalization), enabling input of a subsequent activation function to be normalized, enabling batches to be in standard normal distribution (the mean value is 1 and the standard deviation is 0), enabling numerical values to be more stable, and enabling the model convergence speed to be faster by adopting a Linear correction Unit (RecUu) as the activation function after Batch Normalization. Max Pooling (Max Pooling) operation is performed after the 4 th convolutional layer, thereby reducing feature dimensionality and computational complexity. In the fifth convolution layer, we use a 3 × 3 convolution kernel as the class prediction module, and the number of output channels is 6, and each channel corresponds to the confidence of an anchor frame. In the sixth convolutional layer, we use a 3 × 3 convolutional kernel to predict the bounding box. And for each prediction frame, determining the category of the prediction frame according to the calculated category prediction value, and filtering the prediction frame belonging to the background. Then, the prediction boxes with lower threshold are filtered out with a confidence threshold of 0.5, and the first 200 prediction boxes with higher confidence are retained. Finally, adopting a non-maximum value to inhibit NMS algorithm, filtering out a prediction box with a threshold value larger than 0.7, and finally obtaining a prediction result

Step 1.2: power line foreign matter detection model construction based on convolutional neural network

The detailed structure diagram of the convolutional neural network proposed in this step is shown in fig. 3. The method comprises the steps of inputting aerial images of the foreign bodies of the power lines, carrying out convolution on the aerial images through 10 convolution layers, wherein the size of a convolution kernel is 3 x 3, the first 6 convolution layers serve as a main network and are used for extracting target features of the foreign bodies of the power lines, and pooling operation is added after the 2 nd, 4 th and 6 th convolution layers. The 7 th layer and the 8 th layer are respectively provided with a convolution kernel of 3 multiplied by 3, Batch Normalization (Batch Normalization) is adopted, the input of a subsequent activation function is normalized, the Batch is in standard normal distribution (the average value is 1, the standard deviation is 0), the numerical value is more stable, and a Linear correction Unit (Rectised Linear Unit, ReLU) is adopted as the activation function after Batch Normalization, so that the convergence speed of the model is higher. And adding a maximum pooling layer with the span of 2 after 7 and 8 layers respectively, and halving the length and the width of the input feature. 7 th, 8 th, 9 th and 10 th convolutional layers are used as prediction modules, and each module comprises two 3 x 3 convolutional layers for class prediction and bounding box prediction respectively, so that prediction values of different scales among multiple layers are reserved. And then, converting the multi-scale prediction value into a two-dimensional array, wherein the first dimension is the number of samples, the second dimension is the number of channels, and all outputs are spliced together on the second dimension to realize the combination of the multi-scale prediction values. And for each prediction frame, determining the category of the prediction frame according to the calculated category prediction value, and filtering the prediction frame belonging to the background. Then, the prediction boxes with lower threshold are filtered out with a confidence threshold of 0.5, and the first 200 prediction boxes with higher confidence are retained. And finally, adopting a non-maximum value to inhibit NMS algorithm, filtering out the prediction box with the threshold value larger than 0.7, and finally obtaining the prediction result.

Step 2: neural network pre-training

The method uses the power line image data set to train the light power line detection model, uses the power line foreign matter data set to train the power line foreign matter model, sends the power line image to the power line detection model, filters irrelevant frames, and sends the key frame containing the power line to the power line foreign matter detection model, thereby realizing the real-time power line foreign matter target detection.

Step 2.1: target detection model pre-training

Step 2.1.1: constructing a pre-training data set

In the pre-training stage, a power line Image Dataset (Powerline Image Dataset) is selected to train a power line target detection model, and the power line target detection model comprises 2000 aerial images of a power line and 2000 aerial images of a background. The power line aerial images are taken from different regions in different seasons, and the image size is 512 x 512. The power line foreign matter detection model is trained by selecting a power line foreign matter data set, the power line foreign matter detection model comprises 1000 aerial images of balloons and kites, and the aerial images cover different angles, regions and backgrounds.

2.1.2 model pretraining

In the power line foreign object scene, the frame may appear at any position of the picture and have any size. In order to simplify the search process, the power line foreign object model uses a default bounding box, i.e., an anchor box, and uses the anchor box as a search starting point. The arrangement of the anchor frame includes two aspects of dimension and aspect ratio. For an input size w × h, for a given size s ∈ (0, 1), a bounding box of size ws × hs will be generated; for a given ratio r>0, will generate a size of

The bounding box of (2). In the invention, s is 0.1, 0.25 and 0.5, and r is 0.5, 1 and 2. For each input pixel, the default anchor box is sampled 5 at its center. In the training process, firstly, the real value (ground route) in the training data is determined to be matched with which anchor frame, and the boundary frame corresponding to the anchor frame corresponding to the real value is used for prediction. For each real object in the photograph, the anchor box with the largest Intersection over Union (IoU) value matches it. The intersection ratio is a probability value describing the bounding box distance, as shown in equation (1):

wherein alpha is a prediction result, xi is a real boundary value, a large intersection ratio indicates that two frames are very similar, and a small intersection ratio indicates that the two frames are not similar. For the remaining unmatched anchor boxes, if IoU for a certain real value is greater than the threshold of 0.5, then the anchor box will also match this real value.

In the power line detection model and the power line foreign object detection model, L (x, c, L, g) represents a loss function, defined as a weighted sum of a position error (loc) and a confidence error (conf), as shown in formula (2), x is an input training image, c is a category confidence prediction value, L is a prediction value of a bounding box corresponding to an anchor frame, g is a position parameter of a true value, N is the number of positive samples of the anchor frame, and α is an adjustment ratio of a foreground loss function and a background loss function, where 1 is taken.

L_loc(x, l, g) is the loss function of the bounding box prediction, as shown in equation (3). Where cx, cy are the center coordinates of the bounding box, w, h are the bounding box width and height, and the anchor frame position is defined by d ═ d (d)^cx，d^cy，d^w，d^h) The corresponding bounding box is represented by b ═ b^cx，b^cy，b^w，b^h)，

I.e., the conversion value of the bounding box with respect to the anchor box, is calculated according to the equations (4), (5), (6) and (7).

And the predicted value of the m parameter of the boundary frame corresponding to the ith anchor frame is obtained. Pos represents a positive sample set, i represents an anchor box number, and j represents a true value number. When in use

When the ith anchor frame is matched with the jth real value, and the category of the real value is k, when

A time indicates a mismatch. For position error, the Smooth L1 function is used.

L_conf(x, c) represents a loss function for class prediction, where x represents the input image, Neg represents the set of negative samples, o represents the anchor box number taken from the positive samples, represents the anchor box number taken from the negative samples, and t represents the true value number.

Is used for explaining the matching state when

Indicates that the o-th anchor box matches the t-th true value, and the class of true values is p when

A time indicates a mismatch. As shown in equation (8):

subsequently, a minimization loss function is trained. And (3) minimizing the cost function by adopting a random gradient descent (SGD) method, calculating and predicting the characteristic diagram results of the convolution layers with different scales, and combining the prediction outputs of different layers. Pre-training requires all data sizes to be normalized, so the present invention resets the original image to 512 x 512 pixels for pre-training. The learning rate is the most important parameter of the random gradient descent method, and determines the updating speed of the weight value. The momentum parameter and the weight attenuation factor can improve the training adaptivity. Through experimental observation, the learning rate is set to 10 by the invention^-3The momentum parameters were set to 0,99, and the weight decay factor was set to default 0.0005. the stochastic gradient descent learning process was accelerated by an NVIDIA TITAN XP device for 60000 iterations.

The detailed pre-training process of the power line target detection model is as follows, wherein

For the initial power line boundary and class predictions, c₁，l₁For the final power line boundary prediction value and the class prediction value,

representing the network parameters of the power line detection model, and u belongs to (0,15) as the sequence number of parameter iteration.

1) Reading in a power line image dataset and initializing a power line detection model

2) Calculating by using power line detection network, and outputting boundary prediction value

And category prediction values

3) Will be provided with

And

inputting loss functions and summing the outputs of the loss functions, i.e. combining the outputs of the two loss functions to obtain a loss output value

4) According to

Training a power line detection network by using SGD, and updating parameters to

5) According to

6) Repeating the steps 2-5 for 15 times to obtain a power line detection model pre-training final parameter beta₁，c₁，l₁。

The detailed pre-training process of the power line foreign object detection model is as follows, wherein

For the initial power line boundary and class predictions, c₂，l₂For the final power line boundary prediction value and the class prediction value,

1) Reading in a power line foreign object image dataset and initializing a power line detection model

And category prediction values

3) Will be provided with

And

4) According to

5) According to

6) Repeating the steps 2-5 for 15 times to obtain a power line detection model pre-training final parameter beta₂，c₂，l₂

And step 3: power line foreign object identification

In the aerial photography image, a large number of irrelevant frames exist, such as the take-off and landing of an unmanned aerial vehicle and the peripheral flight process of a power line, and the irrelevant frames do not include power line targets, so that the identification efficiency of the power line foreign matter image is reduced.

Step 3.1 Power line mesh detection

Firstly, inputting a video frame into a power line target detection model, outputting a boundary box predicted value and a category predicted value, then using a non-maximum suppression algorithm, reserving a boundary box with higher confidence coefficient, and finally drawing a frame.

Step 3.1.1 Power line object class and boundary prediction

An image x to be detected is detected_iSending into a power line target detection model, and outputting a predicted boundary value c₁And a class predictor l₁Since each pixel generates several anchor boxes, we predict a large number of similar table boxes.

Step 3.1.2 Power line target class and boundary prediction result optimization

For a large number of similar table frames calculated in step 3.1.1, we will use non-maximum suppression to suppress redundant frames, sort all frames according to confidence, select the frame with the highest confidence, then traverse all the rest frames, if the IoU value with the frame with the highest score is larger than the threshold value 0.8, we delete it, repeat the above process continuously, and finally keep the frame with higher confidence. Finally, in the frame set after the non-maximum suppression processing, a frame with a confidence exceeding 0.6 is drawn as a final frame.

3.2 Power line foreign object target detection

Inputting the key frame containing the power line processed in the step 3.1 into a power line foreign object target detection model, calculating a foreign object target boundary predicted value and a category predicted value, drawing a foreign object boundary frame of the key frame with foreign objects, judging whether the key frame is overlapped with the power line boundary frame, and finally drawing the overlapped boundary frame.

Step 3.2.1 Power line foreign object class and boundary prediction

A to-be-detected image x is detected_iSending into a foreign object detection model of the power line, and outputting a prediction boundaryValue c₂And a class predictor l₂Each pixel generates several anchor boxes, so we predict a large number of similar table boxes.

Step 3.2.2 optimization of Power line target class and boundary prediction results

For a large number of similar table boxes calculated in step 3.2.2, we will use non-maximum suppression to suppress redundant boxes and keep the bounding box with confidence above 0.6. And then comparing the predicted value of the power line foreign body frame with the predicted value of the power line foreign body frame, deleting IoU frames with the value of 0, and finally drawing the rest frames.

Step 3.3: evaluation of test results

The invention uses the average absolute error-based criterion to evaluate the boundary prediction result. The mean absolute error is MAE, and the formula is as follows:

e_i＝|f_i-y_i| (9)

wherein f is_iIndicates the predicted value, y_iRepresenting true value y_i，e_iAbsolute error.

Claims

1. Low latitude image power line foreign matter detection method of taking photo by plane based on light-duty convolution neural network, its characterized in that:

firstly, a light power line detection model is constructed by utilizing a convolutional neural network, a network is pre-trained on a self-constructed power line image data set, and the depth characteristic of a power line is extracted; secondly, training a power line foreign matter detection model by using a convolutional neural network, adding convolutional layers with different lengths and widths into the model, and simultaneously calculating predicted values in a plurality of layers; then combining the outputs of different layers so as to learn the depth characteristics of the multi-scale target; pre-training by using a self-built power line foreign matter data set, and randomly turning, cutting and changing colors by adopting a data augmentation method, so that the data volume is expanded, and the generalization capability is further improved; finally, in the stage of detecting the foreign matter in the power line, firstly, removing irrelevant frames in a video by using a power line detection model, reserving key frames containing power lines, extracting predicted values of power line boundary frames in the key frames, then, detecting the key frames by using the power line foreign matter detection model, obtaining predicted values of all targets, filtering more similar boundary frames by using a non-extreme value inhibition algorithm, reserving boundary frames with higher confidence coefficient, and then, detecting the foreign matter in aerial power line images by using the obtained power line boundary frames and foreign matter target boundary frames;

Inputting a power line aerial image, performing convolution through 6 convolution layers, wherein the size of the convolution kernel is 3 multiplied by 3, batch normalization is adopted for the first four convolution layers, the input of a subsequent activation function is normalized, the batch is in standard normal distribution, and a linear correction unit is adopted as the activation function after the batch normalization, so that the convergence speed of the model is higher; performing maximum pooling operation after the 4 th convolution layer, thereby reducing characteristic dimensionality and reducing calculated amount; in the fifth convolution layer, a 3 multiplied by 3 convolution kernel is used as a class prediction module, the number of output channels is 6, and each channel corresponds to the confidence coefficient of one anchor frame; in the sixth convolutional layer, a 3 × 3 convolutional kernel is used to predict the bounding box; for each prediction frame, determining the category of the prediction frame according to the calculated category prediction value, and filtering the prediction frames belonging to the background; then, filtering out the prediction boxes with lower thresholds by using a confidence threshold of 0.5, and reserving the first 200 prediction boxes with higher confidence; finally, adopting a non-maximum value to inhibit an NMS algorithm, filtering out a prediction box with a threshold value larger than 0.7, and finally obtaining a prediction result;

Inputting an aerial image of the foreign matter on the power line, performing convolution through 10 convolution layers, wherein the size of a convolution kernel is 3 multiplied by 3, the first 6 convolution layers are used as a main network for extracting the target feature of the foreign matter on the power line, and pooling operation is added after the 2 nd, 4 th and 6 th convolution layers; 7,8 layers are respectively provided with a convolution kernel of 3 multiplied by 3, batch normalization is adopted, the input of a subsequent activation function is normalized, the batch is in standard normal distribution, the numerical value is more stable, and a linear correction unit is adopted as the activation function after batch normalization, so that the convergence speed of the model is higher; adding a maximum pooling layer with the span of 2 after 7 layers and 8 layers respectively, and halving the length and the width of the input features; 7,8,9,10 convolutional layers as prediction modules, each module contains two 3 × 3 convolutional layers for class prediction and boundary frame prediction, so that prediction values of different scales among multiple layers are reserved; then, converting the data into a two-dimensional array, wherein the first dimension is the number of samples, the second dimension is the number of channels, and all outputs are spliced together on the second dimension to realize the combination of multi-scale predicted values; for each prediction frame, determining the category of the prediction frame according to the calculated category prediction value, and filtering the prediction frames belonging to the background; then, filtering out the prediction boxes with lower thresholds by using a confidence threshold of 0.5, and reserving the first 200 prediction boxes with higher confidence; finally, adopting a non-maximum value to inhibit NMS algorithm, filtering out a prediction box with a threshold value larger than 0.7, and finally obtaining a prediction result;

and 2, step: neural network pre-training

The method comprises the steps of training a light power line detection model by using a power line image data set, training a power line foreign matter model by using a power line foreign matter data set, firstly sending a power line image into the power line detection model, filtering irrelevant frames, and then sending a key frame containing a power line into the power line foreign matter detection model, thereby realizing real-time power line foreign matter target detection;

step 2.1: target detection model pre-training

Step 2.1.1: constructing a pre-training data set

In the pre-training stage, a power line image data set is selected to train a power line target detection model, and the power line target detection model comprises a plurality of power line aerial images and a plurality of background aerial images; the power line aerial images are taken from different regions in different seasons, and the image size is 512 multiplied by 512; selecting a power line foreign matter data set to train a power line foreign matter detection model, wherein the model comprises two aerial images of a balloon and a kite, and covers different angles, regions and backgrounds;

2.1.2 model pretraining

In a foreign matter scene of the power line, the frame can appear at any position of the picture and has any size; the power line foreign body model uses a default boundary frame, namely an anchor frame, and takes the anchor frame as a search starting point; the setting of the anchor frame comprises two aspects of dimension and aspect ratio; for an input size w × h, for a given size s ∈ (0, 1), a bounding box of size ws × hs will be generated; for a given ratio r>0, will generate a size of

The bounding box of (1); s is 0.1, 0.25 and 0.5, r is 0.5, 1 and 2; for each input pixel, sampling the default anchor box 5 at its center; in the training process, firstly, determining which anchor frame the real value in the training data is matched with, and predicting the boundary frame corresponding to the anchor frame corresponding to the real value; for each real object in the photograph, the anchor box with which the intersection ratio IoU is the greatest matches; the intersection ratio is a probability value describing the bounding box distance, as shown in equation (1):

wherein alpha is a prediction result, xi is a real boundary value, a large cross-over ratio indicates that two frames are very similar, and a small cross-over ratio indicates that the two frames are dissimilar; for the remaining unmatched anchor boxes, if IoU for a real value is greater than the threshold of 0.5, then the anchor box will also match this real value;

in the power line detection model and the power line foreign matter detection model, L (x, c, L, g) represents a loss function and is defined as a weighted sum of a position error and a confidence error, as shown in formula (2), x is an input training image, c is a category confidence prediction value, L is a predicted value of a boundary frame corresponding to an anchor frame, g is a position parameter of a true value, N is the number of positive samples of the anchor frame, and alpha is an adjustment ratio of a foreground loss function and a background loss function, wherein 1 is taken;

L_loc(x, l, g) is the loss function of the bounding box prediction, as shown in equation (3); where cx, cy are the center coordinates of the bounding box, w, h are the bounding box width and height, and the anchor frame position is defined by d ═ d (d)^cx，d^cy，d^w，d^h) The corresponding bounding box is represented by (b) in^cx，b^cy，b^w，b^h)，

The conversion value of the boundary frame relative to the anchor frame is calculated according to the formulas (4), (5), (6) and (7);

the predicted value of the m parameter of the boundary frame corresponding to the ith anchor frame is obtained; pos represents a positive sample set, i represents an anchor frame serial number, and j represents a true value serial number; when in use

Time indicates a mismatch; for the position error, a Smooth L1 function is adopted;

L_conf(x, c) represents a loss function of class prediction, wherein x represents an input image, Neg represents a negative sample set, o represents an anchor frame serial number in a sample, and t represents a true value serial number;

is used for explaining the matching state when

Time indicates a mismatch; as shown in equation (8):

then, training is carried out by minimizing a loss function; minimizing the cost function by adopting a random gradient descent method, calculating and predicting the characteristic diagram results of the plurality of convolution layers with different scales, and combining the prediction outputs of different layers; pre-training requires normalizing all data sizes, so the original image is reset to 512 × 512 pixels for pre-training; set learning rate to 10^-3The momentum parameter is set to 0.99, the weight attenuation factor is set to default 0.0005, and the random gradient descent learning process is accelerated by NVIDIA TITAN XP equipment for more than 60000 iterations.

2. The detection method according to claim 1, characterized in that:

For the initial power line boundary and class predictions, c₁，l₁For the final power line boundary prediction value and the category prediction value,

representing a power line detection model network parameter, wherein u belongs to (0,15) as a parameter iteration sequence number;

And category prediction values

3) Will be provided with

And

4) According to

Training electricity with SGDThe force line detecting network updates the parameters to

5) According to

6) Repeating the steps 2-5 for 15 times to obtain a power line detection model pre-training final parameter beta₁，c₁，l₁；

representing the network parameters of the power line detection model, and taking u e (0,15) as the sequence number of parameter iteration;

And category prediction values

3) Will be provided with

And

4) According to

5) According to

6) Repeating the steps 2-5 for 15 times to obtain a power line detection model pre-training final parameter beta₂，c₂，l₂。

3. The detection method according to claim 1, characterized in that:

and 3, step 3: power line foreign object identification

Firstly, a power line detection model is used for detection, frames without power lines are not processed, and the frames with the power lines are subjected to next power line foreign matter detection, so that the overall detection speed is improved;

step 3.1 Power line target detection

Firstly, inputting a video frame into a power line target detection model, outputting a boundary box predicted value and a category predicted value, then using a non-maximum suppression algorithm, reserving a boundary box with higher confidence coefficient, and finally drawing a frame;

step 3.1.1 Power line object class and boundary prediction

An image x to be detected is detected_iSending into a power line target detection model, and outputting a predicted boundary value c₁And a class predictor l₁Since each pixel will generate several anchor frames, a large number of similar frames will be predicted;

step 3.1.2 Power line target class and boundary prediction result optimization

For a large number of similar frames calculated in the step 3.1.1, inhibiting redundant frames by adopting non-maximum inhibition, sorting all the frames according to confidence degrees, selecting the frame with the highest confidence degree, traversing all the rest frames, deleting the frame with the highest current score if the value IoU of the frame with the highest current score is greater than the threshold value 0.8, continuously repeating the process, and finally keeping the frame with the higher confidence degree; finally, in the frame set after the non-maximum suppression processing, a frame with the confidence coefficient exceeding 0.6 is drawn as a final frame;

3.2 Power line foreign object target detection

Inputting the key frame containing the power line processed in the step 3.1 into a power line foreign object target detection model, calculating a foreign object target boundary predicted value and a category predicted value, drawing a foreign object boundary frame of the key frame with foreign objects, judging whether the key frame is overlapped with the power line boundary frame, and finally drawing the overlapped boundary frame;

step 3.2.1 Power line foreign object class and boundary prediction

An image x to be detected is detected_iSending into a foreign object detection model of the power line, and outputting a predicted boundary value c₂And a class predictor l₂Each pixel generates a plurality of anchor frames, so a large number of similar frames can be predicted;

For a large number of similar frames calculated in the step 3.2.2, adopting non-maximum suppression to suppress redundant frames, and reserving frames with confidence degrees exceeding 0.6; then comparing the predicted value of the power line foreign matter frame with the predicted value of the power line frame, deleting IoU frames with the value of 0, and finally drawing the rest frames;

step 3.3: evaluation of test results

Evaluating the boundary prediction result by using a standard based on the average absolute error; the mean absolute error is MAE, and the formula is as follows:

e_i＝|f_i-y_i| (9)

wherein f is_iIndicates the predicted value, y_iRepresenting true value y_i，e_iIs an absolute error.