CN113378672A

CN113378672A - Multi-target detection method for defects of power transmission line based on improved YOLOv3

Info

Publication number: CN113378672A
Application number: CN202110600438.3A
Authority: CN
Inventors: 韩恒; 陈万培; 张涛; 高绅; 杨钦榕
Original assignee: Yangzhou University
Current assignee: Yangzhou University
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-09-10

Abstract

The invention discloses a multi-target detection method for defects of a power transmission line based on improved YOLOv3, which comprises the following steps: step one, screening a data set image, screening an original image, and selecting a target image meeting requirements; step two, carrying out image augmentation on the images obtained in the step one to obtain a target data set; after the data amplification is completed, image preprocessing operation needs to be carried out on partial photos of the target data set, and the images are processed by using a piecewise linear transformation gray level transformation method, a histogram equalization method, a homomorphic filtering method and a smooth denoising method; fourthly, sorting and labeling the target data set preprocessed in the third step to obtain a target data set; step five, improving YOLOv3 by 'combining' the feature attention mechanism and the fusion; and step six, training the target data set in the improved algorithm to detect pictures.

Description

Multi-target detection method for defects of power transmission line based on improved YOLOv3

Technical Field

The invention relates to the technical field of target detection and identification, in particular to a power transmission line defect multi-target detection method based on improved YOLOv 3.

Background

The transmission line is divided into an overhead transmission line and a cable line, the overhead transmission line is composed of a line tower, a conducting wire, a line fitting, an insulator, a stay wire, a grounding device and the like, is widely distributed and is distributed in various terrains such as fields, urban areas, deserts, lakes and the like. Because the long-term operation is in the field, experiences the impact of extreme weather such as stormy wind, storm and insolation, parts such as wire, gold utensil, insulator appear defects such as corrosion, damage, disconnected strand easily, simultaneously, the part installation is not standard also brings the hidden danger for transmission line safe operation.

Along with the development and the application of transmission line combination unmanned aerial vehicle target defect identification scheme, the picture data volume that patrols and examines the mode and acquire through unmanned aerial vehicle is exponential type and increases, and traditional artifical mode drawback of patrolling and examining shows gradually. And the computer is used for carrying out intelligent defect identification on the inspection picture, so that the requirement on professional quality of personnel is further improved. At present, the proportion of unmanned aerial vehicles in the electric power inspection operation is higher and higher, and along with unmanned aerial vehicle inspection is more and more intelligent, automatic, but future electric power inspection development direction should realize that unmanned aerial vehicle inspection operation scene covers entirely.

The Power computer vision (Power CV) is a sub-field of Power artificial intelligence, which solves the visual problem in each link of a Power system by utilizing the technologies of machine learning, pattern recognition, digital image processing and the like and combining with the knowledge in the Power professional field, and relates to each link of 'transmission and transformation' of the whole Power system. Various camera supervisory equipment of circuit installation utilizes unmanned aerial vehicle to patrol and examine work, and the content of patrolling and examining the circuit is shot, produces a large amount of videos and images, need combine the relevant knowledge of electric power system, just can be better carry out analysis processes to it. In the aspect of automatic identification of defects of massive images, because the images shot by the power transmission line have obvious multi-scale structural characteristics, on one hand, the background of the images shot by the close-distance unmanned aerial vehicle is complex, and higher misjudgment can be caused by the influence of light; on the other hand, when the unmanned aerial vehicle shoots at different shooting angles, a large number of shielding situations can exist, and the separation of the local outline structure is a difficult task.

Helicopter inspection methods initially utilized a super-red method to identify rusty parts using least squares fitting and geometric features on images taken artificially by helicopters in the air. However, the method has limited recognition accuracy and slow detection speed. And later, a helicopter is used for carrying a real-time infrared video sequence shot by the thermal infrared imager, and the defective area in the image is determined by using methods such as Hough transformation, an Otsu adaptive threshold algorithm, SIFT feature matching and the like. With the continuous promotion of science and technology, the helicopter patrols and examines this kind of semi-artificial mode of patrolling and examining and can not satisfy smart power grids development demand.

In recent years, by means of a new-generation artificial intelligence technology represented by deep learning, an inspection image defect identification algorithm is continuously innovated and is gradually applied to an unmanned aerial vehicle intelligent inspection project of an overhead transmission line. With the application of object detection algorithms based on CNN, such as RCNN, Fater-RCNN and YOLO, becoming more mature and the further improvement of hardware operation level, the object detection algorithms also play a unique advantage in the field of power computer vision. An improved Fater-RCNN algorithm is proposed, a self-built equipment sample library is used for model training, and target detection is carried out on an electric power inspection image, so that the detection precision and the detection speed of a model are improved, but the identification accuracy of small targets is not high, and the instantaneity cannot be guaranteed.

Compared with the method, the one-stage target detection algorithm represented by the YOLO algorithm based on the Convolutional Neural Network (CNN) in deep learning has a detection speed obviously higher than that of the fwo-stage target detection algorithm based on roi (region of interest) such as the far-RCNN on the premise of keeping high identification accuracy, and can meet the real-time requirement of the system, so that the method is more suitable for application in an industrial field.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a multi-target detection method for the defects of the power transmission line based on improved YOLOv3, the original FPN characteristic Fusion mode is improved by adopting an Attention mechanism-Fusion (Attention-Fusion) mode, a cosine learning rate, a synchronous normalization technology and other special neural network training skills are used in training, on the premise of not changing a neural network architecture, extra reasoning and calculation cost is not introduced, and the performance of YOLOv3 is obviously improved; and training and learning the self-made data set by using an improved algorithm so as to realize multi-target identification of the defects of the power transmission line.

The purpose of the invention is realized as follows: a multi-target detection method for defects of power transmission lines based on improved YOLOv3 comprises the following steps:

the method comprises the following steps: screening a data set image, carrying out purposeful screening on an obtained original image, wherein the image at least comprises one of six types of ground wires, vibration dampers, bird nests, signboards, ground wire clamps and spacing rods, and initially selecting a target image meeting requirements;

step two: carrying out image augmentation on the image obtained in the step one, processing the screened image in a data augmentation mode, wherein the data augmentation mode comprises translation, rotation, overturning, scaling and cutting and rotation and translation combined transformation, and randomly selecting a translation distance and a rotation angle in the processing process to obtain a target data set;

step three: after the data amplification is completed, image preprocessing operation needs to be carried out on partial photos of the target data set, and the images are processed by using a piecewise linear transformation gray level transformation method, a histogram equalization method, a homomorphic filtering method and a smooth denoising method;

step four: sorting and labeling the target data sets preprocessed in the step three, modifying the picture names in batches, and labeling the target data sets in batches;

step five: improving YOLOv3 by combining the characteristic attention mechanism and the fusion to obtain an improved algorithm;

step six: and training in the improved algorithm by using the marked target data set, and finishing the detection of the picture to be detected.

Preferably, the algorithm improvement in the step five specifically includes:

note that the force mechanism-fusion, for any given transformation, the input feature maps X1 and X2 were subjected to 1X 1 convolutions, respectively, resulting in T1 and T2, where,

representing a space structure for a scale space, H represents the height of the feature map, W represents the width of the feature map, and C1 and C2 represent the channel numbers;

the T1 and T2 features are transmitted to a maximum average pooling operation, the features are compressed to H multiplied by W space dimension, the features at the moment become vectors with global receptive fields in a certain sense, and the output dimension is matched with the number of input feature channels, such as the following two formulas:

wherein

Then, performing full-connection layer operation, namely replacing the full-connection operation in the traditional sense with convolution with the convolution kernel size of 1 multiplied by 1 and the step length of 1, and obtaining S1 and S2 after the full-connection operation;

adding S1 and S2 to obtain P, and then re-aggregating original features on channel dimension, as shown in the formula:

P＝S1+S2#(4.19)

wherein

P is subjected to Sigmoid function, the output weight is regarded as the importance of each fusion characteristic channel, and then each channel is weighted to X1 and X2 characteristics through matrix operation, so that the original characteristics are re-calibrated and fused in channel dimension, and a new characteristic Y is obtained; the process is as follows:

Y＝(X1+X2)*Sigmoid(P)。

preferably, in the sixth step, a cosine learning rate and a synchronous normalization technology are used for processing in the training process;

the cosine learning rate processing specifically comprises:

when a gradient descent algorithm is used for optimizing the target function, a cosine function is used for reducing the learning rate in a matching way, and the change rule of the learning rate along with the iteration times is shown as the following two formulas:

wherein eta_min，η_maxExpressed as the range of learning rates, T_curIndicates how many epochs, T, are currently executed_maxExpressed as the total epoch number; the following modifications are made in the training process:

in actual training, the TotalIoperation and initialization T of the optimizer are reset when the corresponding epoch is in turn_curAnd (4) finishing.

Compared with the prior art, the invention has the advantages that:

1. by means of image augmentation, the number of data set samples is increased, the occurrence of network overfitting can be reduced, and the robustness and detection precision of a detection algorithm are improved;

2. carrying out image preprocessing on part of sample pictures, wherein a gray scale interval of an interest region in the pictures is highlighted by a piecewise linear transformation gray scale transformation method; the histogram equalization method solves the problems of overexposure or underexposure in the picture; homomorphic filtering eliminates the problem of uneven illumination in the picture; smooth denoising eliminates image noise caused by external factors; the four methods are combined to enable the image to be clearer and have more obvious characteristics, so that the use value of the image is improved; the characteristics of the detected small target in the background are obviously enhanced, the characteristics of the detected small target are clearer, and the detection precision of the small target is improved

3. The cosine learning rate and the synchronous normalized neural network training skill are used, and the cosine function is used for reducing the learning rate in a matching way, so that the learning rate is closer to the global minimum value of Loss; the problem that a BN layer fails during multi-display card training is solved by using a synchronous normalization method; after improvement, the feature extraction capability of the network is obviously improved, and the detection result is enhanced; and meanwhile, the network training time is reduced.

4. The Attention-Fusion mode is used for replacing the Concat Fusion mode to improve the original feature Fusion mode, the relation among feature channels is established by means of the Attention mechanism idea, the non-linear capability of the network is further improved, key information is highlighted, irrelevant information is restrained, information redundancy is reduced, the feature expression capability of the fused feature graph is further enhanced, and the problem caused by sample overlapping is solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a development environment configuration of the present invention.

Fig. 2 is a network part parameter set of the present invention.

FIG. 3 is a flow chart of the multi-target detection method for power transmission line defects.

FIG. 4 is an exemplary graph of a data set.

Fig. 5 is a diagram illustrating an example of the manner in which image data is augmented.

Fig. 6 is a diagram of the result of piecewise linear transformation in image pre-processing.

Fig. 7 is a diagram showing the result of histogram equalization in image preprocessing.

Fig. 8 is a diagram showing the result of the smoothing processing in the image preprocessing.

Fig. 9 is a diagram of an example of a data set category.

FIG. 10 is a LabelImg operating interface diagram.

FIG. 11 is a diagram of data set annotation results.

Fig. 12 is a structural view of a modified YOLOv 3.

FIG. 13 is a diagram of an attention mechanism fusion architecture for algorithm improvement.

Fig. 14 is an identification case of different algorithms.

FIG. 15 shows the detection results of multiple types of defect targets in the power transmission line.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The deep learning training platform used by the invention is configured as shown in fig. 1, and the network part parameter setting is shown in fig. 2.

FIG. 3 is a flow chart of the multi-target detection method for defects of power transmission lines, which comprises the following steps:

the method comprises the following steps: data set image screening

The researched power transmission line data set comprises six samples of a spacer, a shockproof hammer, a bird nest, a signboard, a ground wire and a ground wire clamp. The six types of components are important components of the power transmission line and are also an extremely important ring in power inspection. All images are screened and a picture containing these six types of components is selected as the data set, as shown in fig. 4.

Step two: image augmentation

By means of five processing modes of moving, rotating, overturning, scaling and cutting and rotation-translation combined transformation of a sample picture by using an MATLAB software platform, the translation distance and the rotation angle are randomly selected in the processing process, the diversity of samples is increased, a large batch of sample data sets are obtained, and an image data expansion scheme is shown in fig. 5.

The data augmentation method used is performed by taking the image as a center point by default in actual operation. From the mathematical point of view, the method can be divided into the following steps:

1. moving the rotation point to the origin;

2. rotating around the origin;

3. and then the rotation point is moved back to the original position.

Assume the original coordinates of the image as x₀,y₀,1]^TAnd the coordinates after translation are [ x, y, 1]]^TAnd then the coordinate relationship before and after translation is as follows:

the image translation refers to the translation sum of all pixels in the x and y directions, and the mathematical matrix corresponding to the translation is:

wherein d is_x，d_yAnd respectively indicate the distance moved in the horizontal and vertical directions.

The image rotation is mainly to rotate by any angle through a specified rotation center point (default is the image center point), and the mathematical matrix is expressed as:

where θ is the angle of rotation (in the non-radian scale).

The image flipping includes horizontal flipping and vertical flipping, the mathematical matrix for the horizontal flipping is represented as:

the vertically flipped mathematical matrix is represented as:

in the deep learning task, a common method for clipping an image is to scale an original image by a certain time (1.1 times in the present system) of the original image, and then perform a clipping operation on the scaled image, where a scaling mathematical matrix is expressed as:

in the deep learning task, data augmentation generally adopts a data augmentation mode of various combinations, and the results of different combination sequences are different as known from matrix operation knowledge. To explain this process more intuitively, assume the translation transformation matrix is H_shiftRotation transformation matrix is H_rotate. Mainly using translational-rotational combined data augmentation, there are two different combined transformations.

First, translation is performed before rotation, and then the transformation result mathematical matrix can be expressed as:

secondly, firstly rotating and then translating, the transformation result mathematical matrix can be expressed as:

step three: image pre-processing

1) Gray scale conversion method using piecewise linear transformation

The processing results are shown in fig. 6, which highlights the gray scale regions of the region of interest and relatively suppresses those gray scale regions that are not of interest.

The mathematical expression of piecewise linear transformation is:

wherein the gray scale interval [ a, b ] is adjusted]Linear stretching is performed to obtain a gray scale interval [0, a ]]And [ b, f_max]Is compressed.

2) Equalization method using histogram

As shown in fig. 7, the histogram equalization method can make the gray scales of the image uniformly distributed or the gray scale intervals apart, thereby achieving the purpose of increasing the contrast and making the picture clear.

Assuming that the gray scale of the original image at (x, y) is f, the value range is [0, L-1], when f is 0, the color is black, when f is L-1, the color is white, and the gray scale after equalization is j, the transformation process can be described as follows:

j(x,y)＝T[f(x,y)],0≤f≤L-1

where the transformation T needs to satisfy the condition: t (r) strictly increases over the gray scale interval [0, L-1 ]; when f is more than or equal to 0 and less than or equal to L-1, T is more than or equal to 0 and less than or equal to (r) and less than or equal to L-1, wherein L is less than or equal to 256.

The Cumulative Distribution Function (CDF) satisfies exactly the above two conditions, and its mathematical expression is:

where ω is a formal integral variable;

for finding the probability density function p of the transformed random variable s_s(s)：

Further obtain the

The image equalization transformation T (r) depends on p_r(r) but p_s(s) always satisfying a uniform distribution, and p_rThe form of (r) has no correlation. Since the image pixel distribution is discrete, the discrete form expression of the cumulative distribution function is:

wherein k is more than or equal to 0 and less than or equal to L-1, MN is the total number of image pixels, n_kRepresenting a gray scale of r_kThe gray level value s (k) of each pixel after equalization can be directly calculated from the histogram of the original image.

3) Method for using homomorphic filtering

In the shooting process, the gray level dynamic range of one type of image is large due to uneven illumination of parts, black and white form strong contrast, details are not clearly seen, and the problems cannot be solved by adopting the general piecewise gray linear transformation. And homomorphic filtering can eliminate the adverse effect caused by uneven illumination and enhance the image details.

4) Processing images using smooth denoising

Considering the complexity of detecting the field environment and the image noise introduced in the image acquisition process, the quality of the image can be seriously affected, and an appropriate method needs to be used for eliminating the influence. After investigation, most of noise belongs to random signals, the influence on the image is independent, smooth denoising processing is performed on the image by using low-pass filtering, and the processing effect is shown in fig. 8.

Assuming that the pixel to be processed is f (x, y) and the processed image is g (x, y), the smoothing process can be described as follows:

in the formula, T is more than or equal to 0, and Q is the number of pixels in the neighborhood S.

Step four: data set sorting and labeling

1) And compiling Python scripts to modify the picture names in batches, wherein six digits (000000-999999) are used.

2) The data set is divided into two categories of target detection and fault foreign matter identification, including six categories of ground wires, vibration dampers, bird nests, signboards, ground wire clamps and spacers, as shown in fig. 9. And randomly dividing the training set, the testing set and the verification set according to the proportion.

3) The labeling dataset was batched using the LabelImg labeling tool, as shown in FIGS. 10 and 11. Generating an XML file;

4) and arranging the data set and the XML file, and packaging the data set and the XML file into a data set folder.

Step five: algorithm improvement

The improved YOLOv3 structure is shown in FIG. 12

Attention-Fusion (Attention-Fusion) method:

Attention-Fusion alleviates the inconsistency problem by creating a mechanism to enhance the connection between different feature maps, the structure of which is shown in fig. 13.

Unlike the feature fusion method of adding element by element and adding line by line, the key idea of the invention is to use attention mechanism to establish the relation between feature channels. It comprises two main steps: feature attention extraction and feature fusion.

The method aims to achieve the purpose of improving the network expression capacity by modeling the interdependence relationship among channels of convolution characteristics of different characteristic graphs, learn and utilize global information among the different characteristic graphs, selectively emphasize key information and inhibit useless information.

Attention-Fusion for any given transform, the input feature maps X1 and X2 were subjected to 1 × 1 convolution, respectively, resulting in T1 and T2. Wherein

And (3) transmitting the T1 and T2 features to a maximum average pooling operation, compressing the features to H multiplied by W space dimension, wherein the features become vectors with global receptive fields in a certain sense, and the output dimension is matched with the number of input feature channels. The following two equations:

wherein

This is followed by a full join layer operation where the convolution with a convolution kernel size of 1 x1 and step size of 1 is still used instead of the full join operation in the conventional sense to reduce information redundancy and computational load. After full ligation, S1, S2 were obtained.

P＝S1+S2#(4.19)

wherein

P passes through a Sigmoid function, the output weight is regarded as the importance of each fusion characteristic channel, and then each channel is weighted to X1 and X2 characteristics through matrix operation, so that the original characteristics are recalibrated and fused in channel dimension. The process is as follows:

Y＝(X1+X2)*Sigmoid(P)；

step six: model training improvements

1) Cosine learning rate

When the objective function is optimized by using a Gradient Descent (Gradient decision) algorithm, a cosine function is used to reduce the learning rate in a matching way. The change rule of the learning rate along with the iteration number is shown as the following two formulas:

wherein eta_min，η_maxExpressed as the range of learning rates, T_curIndicates how many epochs, T, are currently executed_maxExpressed as the total epoch number.

For the convenience of implementation, the invention is modified as follows:

thus, in the actual training, the TotalIoperation and initialization T of the optimizer are reset (reset) when the corresponding epoch is in turn_curAnd (4) finishing.

2) The synchronous normalization technology is synchronous normalization, namely BN parameters are fused into a Conv layer, and the principle is as follows:

y_BN＝W_mergex+b_merge

wherein W_mergeIs the weight after fusion, W is the weight before fusion, Var [ x ]]For the variance of the input features x, E [ x]As a statistical mean of the data set of the input features x, b_mergeThe bias after fusion, b the bias before fusion, gamma, epsilon and beta the learning parameters, y_BNIs the fused output.

The designed algorithm was experimentally tested as follows.

In order to verify whether the improved algorithm is real and effective and whether the expected purpose is achieved, a general target detection data set VOC2014 data set is used firstly, and the improved algorithm is verified under the condition that the experimental environment is consistent.

FIG. 14 is a comparison of recognition situations of different algorithms, and it can be seen from the figure that the improved algorithm provided by the invention is superior to other classical target detection algorithms in recognition accuracy, and the detected mAP is 81.6%.

The partial picture detection results are shown in fig. 15. The algorithm has good identification effect on bird nests, spacing rods, loose strands of grounding wires and fading of rod number plates, but the omission of the vibration dampers is easy to occur because the background color is darker and the target color is close to the background, so that the characteristics are not obvious.

And because no exact standard is available at present to distinguish the accurate relation between the normal state and the slippage of the shockproof hammer, the defect is doubtful to be marked, so that the slippage defect of the shockproof hammer is not paid much attention.

The above description of the embodiments is only intended to facilitate the understanding of the method of the invention and its core idea. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims

1. A multi-target detection method for defects of a power transmission line based on improved YOLOv3 is characterized by comprising the following steps:

the method comprises the following steps: screening a data set image, performing purposeful screening on an obtained original image, wherein the image at least comprises one of six types of ground wires, a shockproof hammer, a bird nest, a signboard, a ground wire clamp and a spacer, and preliminarily selecting a target image meeting requirements;

step three: after the data amplification is finished, image preprocessing operation needs to be carried out on partial photos of the target data set, and the images are processed by using a piecewise linear transformation gray level transformation method, a histogram equalization method, a homomorphic filtering method and a smooth denoising method;

step four: sorting and labeling the target data sets preprocessed in the step three, modifying picture names in batches, and labeling the target data sets in batches;

step six: and training in the improved algorithm by using the previously marked target data set, and finishing the detection of the picture to be detected.

2. The multi-target detection method for the defects of the power transmission lines based on the improved YOLOv3 as claimed in claim 1, wherein the algorithm improvement in the step five specifically comprises:

representing a space structure for scale space, wherein H represents the height of the characteristic diagram, W represents the width of the characteristic diagram, and C1 and C2 represent the channel number;

wherein

P＝S1+S2#(4.19)

wherein

P is subjected to Sigmoid function, the output weight is regarded as the importance of each fusion characteristic channel, each channel is weighted to X1 and X2 characteristics through matrix operation, the re-calibration fusion of the original characteristics in channel dimension is realized, and a new characteristic Y is obtained; the process is as follows:

Y＝(X1+X2)*Sigmoid(P)。

3. the multi-target detection method for the defects of the power transmission line based on the improved YOLOv3 as claimed in claim 1, wherein in the sixth step, a cosine learning rate and a synchronous normalization technology are used for processing in the training process;

the cosine learning rate processing specifically comprises: