CN114494908A

CN114494908A - Improved YOLOv5 power transmission line aerial image defect detection method

Info

Publication number: CN114494908A
Application number: CN202210128056.XA
Authority: CN
Inventors: 顾菊平; 胡俊杰; 朱建红; 张新松; 王子旭; 周伯俊; 赵凤申; 张思绪
Original assignee: Nantong University
Current assignee: Nantong University
Priority date: 2022-02-11
Filing date: 2022-02-11
Publication date: 2022-05-13

Abstract

The invention relates to the technical field of image processing, in particular to a defect detection method based on an improved YOLOv5 power transmission line aerial image, which comprises the steps of firstly, establishing a power transmission line aerial image data set, expanding the data set by a picture data enhancement method, and balancing sample types; then, improvement is carried out based on a YOLOv5 network, a 2 x 2 pixel convolution module is used for replacing a FOCUS module, and original image characteristics are better reserved; smaller scale branches are added to improve the small object detection capability; connecting features of the backbone network to a feature fusion layer of the neck network to prevent loss of the features; constructing a C3MHSA module, fusing the CNN and an attention mechanism, and enhancing the learning capacity of a target under interference; and finally, inputting the picture to be detected into the improved YOLOv5 network, and outputting an identification and detection result. The invention can identify and detect various defects in one image, such as insulator string falling, vibration damper falling, bird nest and the like, and has better detection effect under the conditions of complex background, shielding and small target.

Description

Improved YOLOv5 power transmission line aerial image defect detection method

Technical Field

The invention relates to the technical field of image processing, in particular to an aerial image defect detection method based on an improved YOLOv5 power transmission line.

Background

Because the power transmission line has large coverage span, variable environment and complex climate, and is directly exposed in natural environment for a long time, the power transmission line is easily influenced by natural factors such as wind, rain, lightning and the like, the line is easily damaged by insulator string falling, vibration damper falling and the like, and the power supply of the line can be interrupted due to the damage of the parts. In addition, bird damage is also an important cause of transmission line failure. When birds stay on a transmission line, interphase short circuit can be caused; the nesting material of bird nests may cause short-circuiting of wire gaps. If the fault cannot be processed in time, the power supply of the whole line can be interrupted, and even a large-area power failure can be caused. Therefore, the power transmission line is important to be regularly inspected.

Along with the development of smart power grids, national power grid companies and southern power grid companies actively promote unmanned aerial vehicle team construction, and the mode that the inspection mode is mainly based on machine inspection and is assisted by human inspection is changed. Let unmanned aerial vehicle reach the assigned position through positioning technology to independently carry out diversified shooting to transmission line equipment, will contain the image information transmission of a large amount of transmission line equipment states and stand to ground, the staff handles image information on ground. Compare with traditional manual work patrols and examines, unmanned aerial vehicle patrols and examines and can not receive geographical environment's such as mountain and lake restraint, does not have the danger of personal safety to shoot the angle in a flexible way, do not have the vision blind area, can save cost of labor and time cost greatly, improve and patrol and examine efficiency.

While unmanned aerial vehicle inspection has many advantages, it also presents new challenges. Because the picture that unmanned aerial vehicle shot is data acquisition only, still need artifical manual mark fault point from the picture, it is higher to staff's experience requirement, the condition of the easy emergence omission, erroneous judgement. And the sizes of parts are different, the background is complex, visual fatigue is easy to appear in manual inspection, and the accuracy of judgment is reduced. Therefore, an object detection method for automatically detecting parts in a picture is urgently needed. With the development of computer vision, hardware equipment is updated, and the deep learning theory is splendid in the aspect of picture processing. High-level features of the image are extracted through a convolutional neural network, and then classification and positioning are carried out by combining the features. However, due to the complex background of the power transmission line, the shielding of parts, the small target at the defect position and the like, the image recognition effect is not good. Therefore, the present application needs further research to solve the above technical problems.

Disclosure of Invention

Aiming at the problems, the invention provides an improved YOLOv5 power transmission line aerial image defect detection method, which can identify and detect various defects in one image, such as insulator string falling, vibration damper falling, bird nest and the like, and has a good detection effect under the conditions of complex background, shielding and small target.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a defect detection method based on an improved YOLOv5 power transmission line aerial image comprises the following steps:

step 1, establishing an aerial image data set of the power transmission line, expanding pictures by using a picture data enhancement technology, and balancing sample categories; labeling the picture by using a Labelimg tool, labeling the position and the type of an object by using a rectangular frame, and generating an XML format file corresponding to the rectangular frame one by one;

step 2, constructing an improved YOLOv5 network, and replacing an original FOCUS layer with a convolution module of 2 x 2 pixels; a smaller-scale detection layer is added to improve the detection capability of the small object; connecting features of the backbone network to a feature fusion layer of the neck network to prevent loss of the features; constructing a C3MHSA module, combining an attention mechanism with CNN, replacing a convolution module of 3 x 3 pixels in the C3 module by the MHSA module, and replacing the last C3 module of the Backbone and the Neck by the C3MHSA module, thereby enhancing the anti-interference capability under the conditions of complex background and shielding;

step 3, dividing the pictures obtained in the step 1 into a training set, a verification set and a test set; setting training parameters, putting the training set picture into an improved YOLOv5 network for training, and keeping the optimal training network weight best.pt;

and 4, loading the optimal network weight obtained in the step 3 into a network, detecting the test set, and framing the positions and the labeling types of all the parts of the power transmission line in the picture as output results.

Preferably, the specific method of step 1 is as follows:

step 1.1, the aerial image of the power transmission line is expanded through a data enhancement technology, and the data enhancement technology comprises the following steps: horizontal turning, angle rotation, random channel transformation, random chrominance transformation and Gaussian noise increase;

and 1.2, labeling the picture by using a Labelimg tool, labeling the position and the type of an object by using a rectangular frame, and generating an XML format file which records the name of the picture, the position information, the size information and the type information of the object in the picture.

Preferably, the specific method of step 2 is as follows:

and 2.1, constructing an improved YOLOv5 network, and canceling the FOCUS layer. The original FOCUS layer compresses the image in a slicing mode, space information is compressed to channel information, speed is improved, accuracy is reduced, and therefore the FOCUS layer is replaced by a 2 x 2 pixel convolution module, and original image characteristics are well reserved. The convolution module (CBS) is composed of a convolution layer, a BN layer and an activation function, the activation function is a SiLu function, and the formula is as follows: silu (x) ═ x sigmoid (x).

2.2, adding a 160 × 160 pixel detection layer on the basis of the original 3 detection layers to improve the capability of detecting small objects; a branch is led out from a Concat splicing module of 80 pixels of the neutral layer, the branch is changed into 160 pixels by C3, CBS and an up-sampling module, meanwhile, a branch is led out from a C3 module of 160 pixels of the Backbone layer, the branch and the combined branch are sent into a C3 module for feature integration, and then the branch is sent into a detection head after passing through a convolution module with the size of 1 pixel;

step 2.3, constructing a C3MHSA module based on a C3 module, wherein the C3 module is simplified by a BottleneckCSP and comprises a convolution module with 3 1 pixels and n Bottleneck modules, and the Bottleneck module comprises convolution modules with 1 pixels and 3 pixels and a residual module; the C3MHSA module is to convert the convolution module of 3 × 3 pixels in the bottellek module into an MHSA multi-head self-attention mechanism, which is used to enhance the learning ability in the case of occlusion and in the complex background. Considering that the interference and shielding conditions of large objects are more, the lowest layer C3 module of the backhaul and the lowest layer C3 module of the tack are finally replaced by a C3MHSA module.

Wherein, the MHSA (Multi-Head Self-orientation) module adds a two-dimensional position coding part consisting of R_hAnd R_wRepresenting relative information in the vertical and horizontal directions, respectively. The specific process is as follows: firstly, 3 point-by-point convolutions are input to respectively obtain corresponding q, k, v and R_hAnd R_wAnd adding to obtain r, wherein q, k, v and r respectively represent query, key, value and position code. Then matrix multiplication is carried out on r and q to obtain qr^TAnd q and k are subjected to matrix multiplication to obtain qk^TWill qr^TAnd qk^TAnd after addition, matrix multiplication is carried out on the sum and v through a softmax layer to obtain single-head output. And finally, splicing the multiple outputs, fusing the characteristics through a matrix, and finally obtaining the multi-head output.

And 2.4, respectively leading out one branch from the Backbone layer 80X 80, the 40X 40 pixel C3 module and the 20X 20 pixel C3MHSA module to be correspondingly connected into the PAN layer 80X 80, 40X 40 and 20X 20 pixel Concat modules, and preventing the loss of the characteristics.

Preferably, the specific method of step 3 is as follows:

step 3.1, the pictures and XML files obtained in the step 1 are classified according to a training set, a verification set and a test set according to a proportion of 70%: 15%: dividing the mixture into 15 percent;

step 3.2, setting training parameters as follows: the batch size is 16, the initial learning rate is 0.01, the final learning rate is 0.2, the impulse is 0.937, and the weight attenuation coefficient is 0.0005;

step 3.3, using the CIoU as a loss function LCIoU of the prediction box, defined as:

where ρ (b, b)^gt) Representing Euclidean distance between the center points of the prediction frame and the real frame, c representing diagonal distance of a minimum closure area which can simultaneously contain the two frames, alpha representing a weight function, and v representing similarity degree of the length-width ratio;

and 3.4, training a training set picture by using the improved YOLOv5 network, minimizing a loss function through multiple iterations, stopping training after the maximum training times are reached, and storing the optimal network weight of the training.

Preferably, the specific method of step 4 is as follows:

step 4.1, loading the optimal network weight obtained in the step 3 into a network, detecting a test set picture, and framing the positions and the labeling types of all parts of the power transmission line in the picture as an output result;

and 4.2, evaluating the network by using indexes of Precision (Precision), Recall (Recall) and average Precision average (mAP).

Compared with the prior art, the invention has the following beneficial effects:

1. according to the invention, by means of a picture enhancement technology, the generalization capability of the network is enhanced, the method is suitable for various scenes, is not limited to a single defect, can frame various defects in the picture at the same time, and is more suitable for unmanned aerial vehicle power inspection.

2. Aiming at the problems that the remote object in the aerial image of the power transmission line is small and the defect position is small in the power inspection of the unmanned aerial vehicle, the small target detection layer is added, although the network calculation amount is increased, the detection capability of the network on the small target is improved.

3. Aiming at the problems that objects are often mutually shielded in aerial images of the power transmission line and the identification difficulty is increased due to the complex background, the method of combining the CNN and the attention mechanism is introduced by using the CIOU prediction frame loss function, so that the target can be better identified under the shielded and complex background.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a general block diagram of the improved YOLOv5 of the present invention;

FIG. 3 is a block diagram of a C3MHSA module constructed in accordance with the present invention;

FIG. 4 is a schematic diagram of the improved YOLOv5 training process of the present invention;

FIG. 5 shows the actual effect of the present invention using the trained Yolov5 test;

FIG. 6 is a comparison graph of evaluation indexes before and after improvement of the present invention.

Detailed Description

The technical scheme of the invention is explained in detail in the following by combining the drawings and the embodiment:

as shown in fig. 1, a defect detection method based on an improved YOLOv5 power transmission line aerial image includes the following steps:

step 1, establishing a power transmission line aerial photo data set, wherein the photo is composed of a CPLID public data set and other photos collected on Github, and the total number of the photos is 1404. The picture is augmented with picture data enhancement techniques and sample classes are balanced. The data enhancement technology comprises horizontal turning, angle rotation, random channel transformation, random chromaticity transformation, Gaussian noise increase and the like. And labeling the picture by using a Labelimg tool, labeling the position and the category of the object by using a rectangular frame, and generating an XML format file corresponding to the rectangular frame one by one. The XML format file mainly records the name of the picture, the position information, the size information and the category information of an object in the picture.

And 2, constructing an improved YOLOv5 network, wherein the structure of the improved network is shown in FIG. 2.

And 2.1, canceling the original FOCUS layer and replacing the FOCUS layer by a convolution module with 2 x 2 pixels. The original FOCUS layer compresses the image in a slicing mode, space information is compressed to channel information, speed is improved, accuracy is reduced, and therefore the FOCUS layer is replaced by a convolution module with 2 x 2 pixels, and original image characteristics are better reserved. The convolution module consists of a convolution layer, a BN layer and an activation function, wherein the activation function selects a SiLu function, and the formula is as follows: silu (x) ═ x sigmoid (x).

And 2.2, adding a 160 × 160 pixel detection layer on the basis of the original 3 detection layers to improve the capability of detecting small objects. A branch is led out from a Concat splicing module of 80 pixels of the Neck layer, the branch is changed into 160 pixels by C3, CBS and an up-sampling module, meanwhile, a branch is led out from a C3 module of 160 pixels of the Backbone layer, the branch and the combined branch are sent to a C3 module for feature integration, and then the branch is sent to a detection head after a convolution module with the size of 1 pixel.

And 2.3, respectively leading out one branch from the Backbone layer 80X 80, the 40X 40 pixel C3 module and the 20X 20 pixel C3MHSA module to access the PAN layer 80X 80, 40X 40 and 20X 20 pixel Concat modules, so as to prevent the loss of the characteristics.

And 2.4, constructing a C3MHSA module, wherein the structure is shown in FIG. 3, the C3MHSA module is improved on the basis of the C3 module, the C3 module is simplified by a BottleneckCSP and comprises a convolution module with 3 pixels 1 × 1 and n Bottleneck modules, and the Bottleneck module comprises convolution and residual modules with 1 × 1 and 3 pixels. The C3MHSA module is to convert the convolution module of 3 × 3 pixels in the bottellek module into an MHSA multi-head self-attention mechanism, which is used to enhance the learning ability in the case of occlusion and in the complex background. Considering that the interference and the shielding of a large target are more, the C3MHSA modules are finally replaced by the lowest layer C3 module of the Backbone and the lowest layer C3 module of the Neck.

Wherein, the MHSA (Multi-Head Self-orientation) module adds a two-dimensional position coding part consisting of R_hAnd R_wRepresenting relative information in the vertical and horizontal directions, respectively. The specific operation is as follows: firstly, 3 point-by-point convolutions are input to respectively obtain corresponding q, k, v and R_hAnd R_wAnd adding to obtain r, wherein q, k, v and r respectively represent query, key, value and position code. Then matrix multiplication is carried out on r and q to obtain qr^TAnd q and k are subjected to matrix multiplication to obtain qk^TWill qr^TAnd qk^TAnd after addition, matrix multiplication is carried out on the sum and v through a softmax layer to obtain single-head output. And finally, splicing the multiple outputs, fusing the characteristics through a matrix, and finally obtaining the multi-head output.

And 3, dividing the pictures obtained in the step 1 into a training set, a verification set and a test set. Setting training parameters, putting the training set picture into an improved YOLOv5 network for training, and keeping the optimal training network weight bset. The specific operation is as follows:

step 3.1, the pictures and XML files obtained in the step 1 are classified according to a training set, a verification set and a test set according to a proportion of 70%: 15%: 15% of the total amount.

And 3.2, adopting the CIoU as a loss function LCIoU of the prediction box. Is defined as:

where ρ (b, b)^gt) Representing the Euclidean distance between the center points of the prediction frame and the real frame, c representing the diagonal distance of the minimum closure area which can contain two frames simultaneously, a representing the weight function, and v representing the similarity degree of the aspect ratio.

Step 3.3, setting training parameters as follows: the batch size was 16, the initial learning rate was 0.01, the final learning rate was 0.2, the impulse was 0.937, and the weight attenuation coefficient was 0.0005.

And 3.4, training a training set picture by using the improved YOLOv5 network, wherein the training process is as shown in FIG. 4, the loss function is minimized through multiple iterations, the training is stopped after the maximum training times are reached, and the optimal network weight of the training is stored.

And 4, loading the optimal network weight obtained in the step S3 into the network, detecting the test set, and outputting the result of framing the positions and the label categories of all the components of the power transmission line in the graph.

As shown in fig. 5, specific positions of the insulator, the damper, the bird nest, the defect1 (insulator defect, defect2 (damper defect)), and the like are automatically framed and the categories thereof are labeled.

And evaluating the network by using indexes such as Precision (Precision), Recall (Recall) and average Precision average (mAP).

As shown in fig. 6, the final mAP of the improved YOLOv5 network is 0.956, which is 3.1% higher than 0.927 before improvement, wherein the improvements of the stockbridge damper and the Defect2 (stockbridge damper Defect) are larger, which achieves 5% improvement, and this also verifies that the method has a better identification effect on small targets.

While one embodiment of the present invention has been described in detail, the description is only a preferred embodiment of the present invention and should not be taken as limiting the scope of the invention. All equivalent changes and modifications made within the scope of the present invention shall fall within the scope of the present invention.

Claims

1. A defect detection method based on an improved YOLOv5 power transmission line aerial image is characterized by comprising the following steps:

2. The defect detection method based on the improved YOLOv5 power transmission line aerial image as claimed in claim 1, wherein: the specific method of the step 1 is as follows:

3. The defect detection method based on the improved YOLOv5 power transmission line aerial image as claimed in claim 1, wherein: the specific method of the step 2 is as follows:

step 2.1, constructing an improved YOLOv5 network, and canceling a FOCUS layer; a convolution module of 2 x 2 pixels is adopted to replace an FOCUS layer, so that the characteristics of the original image are better kept;

2.2, adding a 160 × 160 pixel detection layer on the basis of the original 3 detection layers to improve the capability of detecting small objects; a branch is led out from a Concat splicing module of 80 pixels of the neutral layer, and is changed into 160 pixels by C3, CBS and an up-sampling module, meanwhile, a branch is led out from a C3 module of 160 pixels of the Backbone layer, and is sent into a C3 module for feature integration after being combined, and then is sent into a detection head after passing through a convolution module with the size of 1 pixel;

step 2.3, constructing a C3MHSA module based on a C3 module, wherein the C3 module is simplified by a BottleneckCSP and comprises a convolution module with 3 1 pixels and n Bottleneck modules, and the Bottleneck module comprises convolution modules with 1 pixels and 3 pixels and a residual module; the C3MHSA module is used for converting a convolution module of 3 x 3 pixels in the Bottleneck module into an MHSA multi-head self-attention mechanism for enhancing the learning capability under the shielding condition and the complex background;

and 2.4, respectively leading out one branch from the Backbone layer 80X 80, the 40X 40 pixel C3 module and the 20X 20 pixel C3MHSA module to access the PAN layer 80X 80, 40X 40 and 20X 20 pixel Concat modules, so as to prevent the loss of the characteristics.

4. The defect detection method based on the improved YOLOv5 power transmission line aerial image as claimed in claim 3, wherein: the convolution module CBS consists of a convolution layer, a BN layer and an activation function, wherein the activation function selects a SiLu function, and the formula is as follows: silu (x) ═ x sigmoid (x).

5. The defect detection method based on the improved YOLOv5 power transmission line aerial image as claimed in claim 3, wherein: the MHSA module adds a two-dimensional position coding part consisting of R_hAnd R_wThe relative information in the vertical and horizontal directions is represented respectively, and the specific flow is as follows: firstly, 3 point-by-point convolutions are input to respectively obtain corresponding q, k, v and R_hAnd R_wAdding to obtain r, wherein q, k, v and r respectively represent query, key, value and position code; then matrix multiplication is carried out on r and q to obtain qr^TAnd q and k are subjected to matrix multiplication to obtain qk^TWill qr^TAnd qk^TAfter addition, matrix multiplication is carried out on the sum and v through a softmax layer to obtain single-head output; finally, a plurality of outputs are spliced, and features are fused through a matrix to finally obtain a plurality of outputsAnd (6) head output.

6. The defect detection method based on the improved YOLOv5 power transmission line aerial image as claimed in claim 1, wherein: the specific method of the step 3 is as follows:

7. The defect detection method based on the improved YOLOv5 power transmission line aerial image as claimed in claim 1, wherein: the specific method of the step 4 is as follows:

and 4.2, evaluating the network by using the indexes of the accuracy, the recall rate and the average accuracy rate mean value.