CN115100549A

CN115100549A - Transmission line hardware detection method based on improved YOLOv5

Info

Publication number: CN115100549A
Application number: CN202210729380.7A
Authority: CN
Inventors: 董凯; 申庆斌; 王承一; 董彦武; 刘秋月; 李�杰; 卢自强; 宋建虎; 王宏飞; 卢自英; 秦俊兵; 何鹏杰; 茹海波; 孙红玲; 邢闯; 史丽君; 张博; 温玮; 李冰; 宋欣
Original assignee: Super High Voltage Transmission Branch Of State Grid Shanxi Electric Power Co
Current assignee: Super High Voltage Transmission Branch Of State Grid Shanxi Electric Power Co
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2022-09-23

Abstract

The invention discloses a transmission line hardware detection method based on improved YOLOv5, which is characterized in that on the basis of original YOLOv5, deep separable convolution is used to simplify the parameter number and the calculated amount of a network, an extrusion and excitation channel attention module is used to enhance the feature extraction capability of a convolution block, and finally the number of filters of the convolution layer is further trimmed through a geometric median, so that the light weight degree of the network is greatly improved under the condition of ensuring the identification accuracy of the original network. According to the invention, YOLOv5 is applied to the hardware detection of the power transmission line, the requirements of real-time performance and accuracy of the edge end are met, and the robustness is good.

Description

Transmission line hardware detection method based on improved YOLOv5

Technical Field

The invention relates to the field of image target detection, in particular to a transmission line hardware detection method for improving YOLOv 5.

Background

The transmission line is used as a main carrier for transmitting electric energy at medium and long distances, metal accessories such as bare wires, conductors, insulators and the like for supporting, fixing and connecting are needed, the variety is various, and the appearance difference is obvious. The hardware damaged by corrosion and deformation is easy to cause large-scale power failure and serious economic loss, so that the realization of efficient detection of the hardware target is beneficial to automatically positioning the fault position of the defect, and the method has great significance for ensuring stable and safe operation of a power grid.

In recent years, with popularization and development of the internet of things and artificial intelligence technology, the power transmission line is subjected to patrol sampling detection through edge equipment such as an unmanned aerial vehicle, and then, an image sample obtained by analyzing the sample is automatically processed based on computer vision and image processing technology to become one of the main modes of patrol at present. However, at present, mature network model parameters are too many, the requirements on hardware computing resources are high, the detection speed is slow, the method cannot be directly applied to hardware samples shot on a power transmission line, the requirements on speed and precision of line inspection cannot be met, and huge network parameters and parameter quantities cannot be applied to edge equipment with limited hardware resources.

Therefore, under the background, the existing model is improved in light weight, the problem of unbalance between the detection speed and the precision of the transmission line hardware is solved, and the problem becomes a core topic of research and application.

Disclosure of Invention

The invention aims to provide a transmission line hardware detection method based on improved YOLOv5, which solves the problem of imbalance between identification precision and speed of an original method, further saves inference time and operation energy consumption required by detection and identification, designs and improves a new trunk and head network, further prunes the number of filters of a convolutional layer through a geometric median, and greatly improves the light weight degree of the network under the condition of ensuring the identification accuracy of the original network.

In order to achieve the purpose, the invention provides the following scheme:

a power transmission line hardware detection method based on improved YOLOv5 comprises the following steps:

s1, aiming at the hardware sample image of the power transmission line, constructing a hardware data set, performing data cleaning and labeling work, and making hardware image sets of different types and different scales;

s2, selecting YOLOv5 as a basic framework, and using the lightweight deep separable volume blocks as a cascade module of a backbone network and a fusion channel of a simplified tail network;

s3, introducing an extrusion expansion channel attention mechanism to improve the characteristic expression capability of the rolling blocks;

s4, unifying sample resolution, expanding data set scale through an image enhancement method, and improving network training effect;

s5, training the model by adopting a stochastic gradient descent method, and predicting through a target horizontal coordinate, a vertical coordinate, a width, a height, a prediction confidence coefficient and a classification result to obtain a detection result of the hardware fitting image;

and S6, calculating the importance degree of each filter based on the geometric median, removing the unimportant redundant channel parameters, and recovering the recognition precision by fine tuning training.

Further, in step S1, a transmission line hardware detection data set is constructed, data cleaning and labeling are performed, and hardware image sets of different types and different scales are manufactured, which specifically includes:

and (3) cleaning the power transmission line hardware image sampled and shot on the spot, and reserving a sample which is clear, has an obvious hardware exposure area and is reasonable in angle.

Further, in the step S2, selecting YOLOv5 as a basic architecture, and using the lightweight deep separable volume blocks as a cascade module of the backbone network and a convergence channel of the simplified tail network, specifically including:

firstly, removing a backbone network and a tail network of a YOLOv5 framework, then replacing the backbone network with a combination of depth separable convolution blocks, extracting spatial features by depth convolution, and fusing and scaling channel information by point convolution. Finally, the sixth, fourth and second depth separable convolution blocks and the seventh and eleventh layers of convolution of the network are selected to generate downsampled feature maps of 80 × 80, 40 × 40 and 20 × 20 respectively.

Further, in step S3, a stress-induced expansion channel attention mechanism is introduced to improve the feature expression capability of the volume block, which specifically includes:

constructing an extrusion expansion channel attention mechanism, introducing 2 convolution layers, keeping the number of input channels of a first layer and the number of output channels of the first layer consistent with the whole convolution block, reducing the number of channels by 4 times by the output channels of the first layer and the input channels of a second layer, setting the sizes of convolution kernels and sliding step lengths to be 1, and activating features by combining global average pooling and ReLU and SiLU functions, so as to calibrate the importance of extracting channel features of a depth separable convolution block, wherein the calculation formula is as follows:

X _o ＝SiLU(Conv2(ReLU(Conv1(GAP(X _i ))))·X _i (1)

in the formula, X _i And X _o Representing the output characteristics of the volume block and the output characteristics after the channel attention mechanism is strengthened respectively, GAP is the global average weighted value of the whole feature map pixels, Conv1 and Conv2 correspond to the first and second layers of convolution operation respectively, ReLU and SiLU correspond to two activation functions, and the calculation process is as follows:

further, in step S4, unifying the sample resolution, expanding the data set scale by an image enhancement method, and improving the network training effect specifically includes:

firstly, counting the quantity distribution of the sizes of all images, then uniformly processing the resolution of the images into 640 multiplied by 640, splicing the images through random scaling, random cutting and random arrangement of four images, and finally integrating corresponding detection frame label information according to the effect of random splicing.

Further, in step S5, a stochastic gradient descent method is used to train the model, and the detection result of the hardware image is predicted through the target horizontal coordinate, the vertical coordinate, the width, the height, the prediction confidence and the classification result. The method specifically comprises the following steps:

firstly, inputting the hardware image and label information into a modified YOLOv5 network, and according to the classification loss L _cls Positioning loss L _loc And target confidence loss L _conf Optimizing the weight value of network parameters, dividing a characteristic graph output by a network into KxK grids, predicting 3 anchor frames by each grid, wherein the total loss of the network is the loss accumulated sum of all grid anchor frames, and a loss calculation formula for a single anchor frame is as follows:

L _conf ＝-c ^gt ·log(c)+(1-c ^gt )log(1-c) (7)

wherein N is the number of the types of the hardware fittings to be tested,

is the true label value, p, of the ith class of the sample _i Is the predicted value of the network for the ith class, D _p 、D _L Respectively representing Euclidean distance between two central points of the network prediction frame and the real label frame and the length of a diagonal line of a minimum circumscribed rectangle of the network prediction frame and the real label frame, IOU is the ratio of intersection and union of areas of the network prediction frame and the real label frame, v is a parameter for measuring the consistency of the length-width ratio, w is a parameter for measuring the consistency of the length-width ratio ^gt 、h ^gt 、w ^p And h ^p Respectively representing the width and height of the real label box and the width and height of the network prediction box, c ^gt And c represents the confidence label of the object at the position of the network prediction box and the confidence of the object predicted by the network.

Then, evaluating the network training effect through average Precision average (mAP), Recall (Recall) and accuracy (Precision), wherein the calculation formula is as follows:

TP _j 、FP _j and FN _j Respectively representing that the area intersection ratio of the network prediction frame and the real label frame is more than 0.5 and less than 0.5, and the number of the error identified as other types in the first j boundary frames of the model prediction is counted, and finally counting the obtained model parameter number and the floating point operation quantity (floating point operations)FLOPs) measure the degree of network lightweight.

Further, in step S6, calculating importance degrees of the filters based on the geometric median, removing unimportant redundant channel parameters, and recovering the recognition accuracy by fine tuning training, specifically including:

first, for a certain convolution layer, the parameter weight tensors of all filters are arranged in descending order of 1 norm. Then all layers of the network are equally divided into 7 parts, the pruning rate of each part is in arithmetic progression distribution, and the cumulative sum is ensured to be the set network parameter clipping proportion. And finally, removing a plurality of channels with the minimum Euclidean distance accumulation sum among the filters according to the pruning rate, and performing network forward propagation only by using the residual channel parameters to realize the simplification and compression of the final model.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects: the invention provides a transmission line hardware detection method based on improved YOLOv5, which comprises the steps of constructing a hardware data set, improving a head and a trunk network by adopting an improved YOLOv5 network as a basic detection model, reducing network parameters, integrating an extrusion and excitation channel attention module to enhance the feature extraction capability of a convolution block, and finally further trimming the number of filters of the convolution layer through a geometric median.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a diagram of the YOLOv5 algorithm;

FIG. 2 is a diagram of a modified YOLOv5 algorithm;

FIG. 3 is a diagram of a depth separable convolution block incorporating a channel attention mechanism;

fig. 4 is an effect diagram of the transmission line hardware detection.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

A transmission line hardware detection method based on improved YOLOv5 mainly comprises a data acquisition processing part, a parameter reduction compression part and a network training testing part, wherein a YOLOv5 algorithm structure in the prior art is shown in figure 1, and a network structure after the improvement of YOLOv5 is shown in figure 2, so that the problems of huge parameter quantity and unbalanced model complexity and accuracy existing in an original YOLOv5 network are solved, and the purpose of reducing the network parameter quantity and calculated quantity as much as possible on the premise of meeting high detection accuracy is achieved.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, the present invention provides a more detailed description thereof with reference to the accompanying drawings and detailed description thereof, wherein:

s5, training the model by adopting a stochastic gradient descent method, and predicting the detection result of the hardware image through the target horizontal coordinate, the vertical coordinate, the width, the height, the prediction confidence coefficient and the classification result;

and S6, calculating the importance degree of each filter based on the geometric median, removing unimportant redundant channel parameters, and recovering the recognition precision by fine tuning training.

The deep learning model needs to be trained and optimized based on a large number of image data samples with labels, because the unmanned aerial vehicle acquires global images of the power transmission line in an aerial photographing mode, the samples need to be sub-sampled in key areas according to network input resolution, and in order to improve the training effect of the network on the samples, the samples containing hardware types need to be further cleaned and screened. Therefore, in step S1, a hardware data set is constructed for the power transmission line hardware sample image, data cleaning and labeling are performed, and hardware image sets of different types and different scales are manufactured, which specifically includes:

and cleaning the power transmission line hardware image shot by sampling on the spot, and reserving a clear sample containing an obvious hardware exposure area and a reasonable angle.

In the method, considering that the original Yolov5 network parameters are too much to meet the requirement of resource-limited equipment detection, the method introduces deep separable convolution to reduce the parameters and the calculated amount of the hardware detection network. In step S2, selecting YOLOv5 as a basic architecture, and using the lightweight deep separable volume block as a cascade module of the backbone network and a convergence channel of the simplified tail network specifically include:

In order to compensate for the problem of reduced fitting ability caused by the depth separable convolution, the companding channel attention module and the depth separable convolution are cascaded, as shown in fig. 3, in step S3, an attention mechanism of the companding channel is introduced to improve the feature expression ability of the volume block, which specifically includes:

constructing an extrusion expansion channel attention mechanism in a convolution block to obtain a channel importance coefficient, multiplying the channel importance coefficient by an original feature map, introducing 2 convolution layers, keeping the number of input channels of a first layer and the number of output channels of a second layer consistent with the whole convolution block, reducing the number of channels by four times by the output channels of the first layer and the input channels of the second layer, setting the sizes of convolution kernels and sliding step lengths to be 1, and activating the features by combining global average pooling, ReLU and SiLU, so as to calibrate the importance of extracting channel features of a depth separable convolution block, wherein the calculation formula is as follows:

X _o ＝SiLU(Conv2(ReLU(Conv1(GAP(X _i ))))·X _i (1)

in step S4, unifying sample resolution, expanding data set scale by an image enhancement method, and improving network training effect, specifically including:

firstly, the number distribution of the sizes of all the images is counted, then the image resolution is uniformly processed into 640 multiplied by 640, splicing is carried out through random scaling, random cutting and random arrangement of four images, and finally corresponding detection frame label information is integrated according to the effect of random splicing.

In the step S5, training a model by using a stochastic gradient descent method, and predicting a detection result of the hardware image by using a target horizontal coordinate, a vertical coordinate, a width, a height, a prediction confidence and a classification result, specifically including:

L _conf ＝-c ^gt ·log(c)+(1-c ^gt )log(1-c) (7)

wherein N is the number of the types of the hardware fittings to be tested,

is the true label value, p, of the ith class of the sample _i Is the predicted value of the network for the ith class, D _p 、D _L Respectively representing Euclidean distance between two central points of the network prediction frame and the real label frame and diagonal length of minimum circumscribed rectangle of the network prediction frame and the real label frame, IOU is ratio of intersection and union of areas of the network prediction frame and the real label frame, v is parameter for measuring consistency of length-width ratio, w is length of the real label frame, and the IOU is a length of the minimum circumscribed rectangle of the network prediction frame and the real label frame ^gt 、h ^gt 、w ^p And h ^p Respectively representing the width and height of the real label box and the width and height of the network prediction box, c ^gt And c represent network prediction frame positions respectivelyPlacing confidence labels of the existing objects and the confidence that the network predicts the existing objects.

TP _j 、FP _j and FN _j Respectively representing that the area intersection ratio of the network prediction frame and the real label frame is more than 0.5 and less than 0.5, and the number of the network prediction frame and the real label frame which are wrongly identified as other types in the first j boundary frames of the model prediction is counted, and finally, the obtained model parameters and floating point operations (FLOPs) are counted to measure the network lightweight degree.

In order to facilitate fitting convergence on a training sample and avoid parameter redundancy in a deep neural network, the channel pruning technique may further reduce the weight of the trained model, in step S6, the importance degree of each filter is calculated based on the geometric median, unimportant redundant channel parameters are removed, and the identification accuracy is restored by fine tuning training, specifically including:

first, for a convolution layer, the parameter weight tensors of all filters are arranged in descending order of 1 norm. Then all layers of the network are equally divided into 7 parts, the pruning rate of each part is in arithmetic progression distribution, and the cumulative sum is ensured to be the set network parameter clipping proportion. And finally, removing a plurality of channels with the minimum Euclidean distance accumulation sum between each layer of filter according to the pruning rate, and performing network forward propagation only by using the residual channel parameters to realize the simplified compression during model prediction.

The detection effect of the method of the present invention is shown in fig. 4. According to the method, on the basis of a YOLOv5 network, a trunk and head network is improved, the model volume is reduced, an extrusion and excitation channel attention module is introduced aiming at the defect of insufficient extraction capability of deep separable convolution characteristics, the number of filters of convolution layers is further pruned through a geometric median aiming at the redundancy of a trained model, and the lightweight degree of the network is greatly improved under the condition that the identification accuracy of the original network is ensured. The invention effectively improves the detection performance of the YOLOv5 network, ensures the detection precision and simultaneously improves the detection speed of various electric transmission line hardware fittings.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A power transmission line hardware detection algorithm based on improved YOLOv5 is characterized by comprising the following steps:

2. The improved YOLOv 5-based transmission line hardware detection algorithm according to claim 1, wherein in step S1, a transmission line hardware detection data set is constructed, data cleaning and labeling are performed, and hardware image sets of different types and different scales are manufactured, specifically including:

3. The power transmission line hardware detection method based on the improved YOLOv5 of claim 1, wherein in step S2, YOLOv5 is selected as a basic architecture, the lightweight deep separable convolution blocks are used as a cascade module of a backbone network, and a fusion channel of a compact tail network, and specifically includes:

firstly removing backbone and tail networks of a YOLOv5 framework, then replacing the backbone networks with a combination of depth separable convolution blocks, extracting spatial features by depth convolution, fusing scaling channel information by point convolution, and finally selecting the sixth, fourth and second depth separable convolution blocks and the seventh and eleventh layers of convolution of the networks to respectively generate downsampled feature maps of 80 × 80, 40 × 40 and 20 × 20.

4. The power transmission line hardware detection method based on the improved YOLOv5 of claim 1, wherein in step S3, an attention mechanism of an extrusion expanding channel is introduced to improve the feature expression capability of the volume block, and the method specifically comprises:

X _o ＝SiLU(Conv2(ReLU(Conv1(GAP(X _i ))))·X _i (1)

5. the power transmission line hardware detection method based on the improved YOLOv5 of claim 1, wherein in step S4, sample resolution is unified, data set size is expanded by an image enhancement method, and network training effect is improved, and specifically the method comprises:

6. The method for detecting hardware of power transmission line based on improved YOLOv5 of claim 1, wherein in step S5, a stochastic gradient descent method is used to train a model, and a detection result of the hardware image is predicted according to a target horizontal coordinate, a vertical coordinate, a width, a height, a prediction confidence and a classification result, and specifically comprises:

L _conf ＝-c ^gt ·log(c)+(1-c ^gt )log(1-c) (7)

wherein N is the number of the types of the hardware fittings to be tested,

is the true label value, p, of the ith class of the sample _i Is the predicted value of the network for the ith class, D _p 、D _L Respectively representing Euclidean distance between two central points of the network prediction frame and the real label frame and diagonal length of minimum circumscribed rectangle of the network prediction frame and the real label frame, IOU is ratio of intersection and union of areas of the network prediction frame and the real label frame, v is parameter for measuring consistency of length-width ratio, w is length of the real label frame, and the IOU is a length of the minimum circumscribed rectangle of the network prediction frame and the real label frame ^gt 、h ^gt 、w ^p And h ^p Respectively representing the width and height of the real label box and the width and height of the network prediction box, c ^gt And c represents the confidence label of the object at the position of the network prediction box and the confidence of the network predicting the object.

Then, evaluating the network training effect through average accuracy mean (mAP), Recall (Recall) and accuracy (Precision), wherein the calculation formula is as follows:

7. The power transmission line hardware detection method based on the improved YOLOv5 of claim 1, wherein in step S6, the importance degree of each filter is calculated based on a geometric median, unimportant redundant channel parameters are removed, and the identification precision is restored by fine tuning training, which specifically comprises:

firstly, for a certain volume of lamination, arranging the parameter weight tensors of all filters according to a 1 norm descending order, then equally dividing all layers of the network into 7 parts, wherein the pruning rate of each part is in equal difference increasing number series distribution, the summation is ensured to be the set network parameter clipping proportion, finally, a plurality of channels with the minimum Euclidean distance summation between the filters of each layer are removed according to the pruning rate, and the network is transmitted forward only by the residual channel parameters, so that the simplified compression of a final model is realized.