CN116895030B

CN116895030B - Insulator detection method based on target detection algorithm and attention mechanism

Info

Publication number: CN116895030B
Application number: CN202311163428.3A
Authority: CN
Inventors: 经弈逍; 高淋锋; 吴昀璞; 黄永茂
Original assignee: Xihua University
Current assignee: Xihua University
Priority date: 2023-09-11
Filing date: 2023-09-11
Publication date: 2023-11-17
Anticipated expiration: 2043-09-11
Also published as: CN116895030A

Abstract

The invention relates to the technical field of image processing, and provides an insulator detection method based on a target detection algorithm and an attention mechanism, which comprises the following steps: acquiring a real shot image of a power supply station to form a training chart set; inputting the training atlas into a YOLOv8 model for training to obtain an insulator prediction model; the YOLOv8 model comprises a backbone network, a neck network and a head network; the backbone network comprises a multi-scale convolution module, a C2f module and a global attention module; the training atlas is subjected to feature image reduction and channel quantity amplification through a multi-scale convolution module, features are extracted through a C2f module, and global information is captured through a global attention module; after extracting and integrating the features again through the neck network, outputting a detected target by the head network; and inputting the real-time acquired images into a trained insulator detection model to obtain an insulator target. The invention reduces the size and complexity of the model and improves the efficiency and accuracy of unmanned aerial vehicle inspection of the insulator while ensuring the detection precision.

Description

Insulator detection method based on target detection algorithm and attention mechanism

Technical Field

The invention relates to the technical field of image processing, in particular to an insulator detection method based on a target detection algorithm and an attention mechanism.

Background

In the case of an electrical power system, an insulator is a critical component of the device, the insulation wire and the pole tower play a role in preventing current leakage in the power line. Therefore, the state of the insulator directly affects the safe and stable operation of the power system. The detection of insulators, in particular to the detection of whether the insulators have defects, is an important link of power system fault prevention and fault diagnosis. Only if the position and the state of the insulator are accurately detected, the subsequent fault detection and fault prevention work can be effectively performed, so that the safe operation of the power system is ensured.

However, detection of insulators is a challenging task due to the complexity of the shape, size, color, and background environment in which the insulators are located. Especially in unmanned aerial vehicle inspection, because unmanned aerial vehicle's computational capacity and memory capacity are limited, have strict requirement to the size and the complexity of detection model, this degree of difficulty that has further increased the insulator and has detected. Although some methods based on deep learning have been proposed to solve this problem, these methods often have some problems, such as oversized model, high computational complexity, low detection accuracy, and the like. These problems limit the effectiveness of these methods in practical applications, particularly in unmanned aerial vehicle inspection insulators.

In addition, the current unmanned aerial vehicle inspection mode is to take down the hard disk carried by the unmanned aerial vehicle, and then import data into a computer for analysis in a studio. Although the detection precision can be ensured by the method, the efficiency is low, the time consumption is long, manual participation is required, and automation cannot be realized. Therefore, how to reduce the size and complexity of the model and improve the efficiency and accuracy of unmanned aerial vehicle inspection of the insulator while ensuring the detection precision is an important direction of current research.

Disclosure of Invention

The invention aims to ensure the detection precision, reduce the size and complexity of a model, improve the efficiency and accuracy of unmanned aerial vehicle inspection of an insulator, and provide an insulator detection method based on a target detection algorithm and an attention mechanism.

In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:

an insulator detection method based on a target detection algorithm and an attention mechanism, the method comprises the following steps:

step 1, acquiring real shot images of a plurality of power supply stations, and forming a training chart set after image preprocessing;

step 2, inputting the training atlas into a YOLOv8 model for training, obtaining a trained insulator prediction model; the YOLOv8 model comprises a backbone network, a neck network and a head network; the main network comprises a multi-scale convolution module, a C2f module and a global attention module; the training atlas is subjected to feature image reduction and channel quantity amplification through a multi-scale convolution module, features are extracted through a C2f module, and global information is captured through a global attention module; after extracting and integrating the features again through the neck network, outputting a detected target by the head network;

and step 3, inputting the real-time acquired image into a trained insulator detection model to obtain an insulator target.

In the scheme, the YOLOv8 model used in the scheme introduces a global attention mechanism to capture more global information in the feature map.

The specific steps of the step 1 include: collecting a real shot image of a power supply station, scaling the real shot image in equal proportion, carrying out mirror image overturning on the image, adding noise square blocks with random sizes and preprocessing of color conversion, and carrying out manual labeling on insulators in the image to form a training image set.

In the scheme, the preprocessing operation on the image is helpful to improve the generalization capability of the model, so that the device can show good detection effect in various practical application scenes.

In the step 2, the backbone network includes a convolutional layer conv_1, a convolutional layer conv_2, a c2f_1 module, a convolutional layer conv_3, a c2f_2 module, a convolutional layer conv_4, a c2f_3 module, a convolutional layer conv_5, a c2f_4 module, a global attention module GAM, and a spatial pyramid module SPPF, which are sequentially connected; the scales of the 5 convolution layers Conv are P1/2, P2/4, P3/8, P4/16 and P5/32 respectively.

In the above scheme, the backbone network is mainly responsible for extracting the features of the image, wherein the convolution layer gradually reduces the size of the image, and increases the number of channels at the same time to extract richer features, the C2f module further extracts and integrates the features, and the C2f module comprises a convolution layer and a plurality of residual layers, so that richer features can be extracted, and meanwhile, the information of the original features is reserved. Finally, through the combined action of the global attention module GAM and the space pyramid module SPPF, the fusion of the multi-scale features is realized, and the complex relationship between the global space information of the feature map and different channels is successfully captured.

In the step 2, the neck network includes a convolutional layer conv_6, a convolutional layer 7, an upsampling layer upsample_1, an upsampling layer upsample_2, a c2f_5, a c2f_6, a c2f_7, a c2f_8, a fully connected layer concat_1, a fully connected layer concat_2, a fully connected layer concat_3, and a fully connected layer concat_4;

after the feature map F1 output by the space pyramid module SPPF passes through the up-sampling layer Upsample_1, the feature map F1 is fused with the feature map F2 output by the C2f_3 module at the full-connection layer Concat_1; the feature map output by the full-connection layer Concat_1 is subjected to feature extraction through a C2f_5 module and is subjected to up-sampling layer Upsample_2, and then is fused with the feature map F1 output by the C2f_3 module at the full-connection layer Concat_2; the feature map output by the full-connection layer Concat_2 is subjected to feature extraction through a C2f_6 module and is fused with the feature map output by the C2f_5 module at the full-connection layer Concat_3 after the feature map passes through a convolution layer Conv_6; after the feature map output by the full connection layer concat_3 is subjected to feature extraction by the C2f_7 module and is subjected to the convolution layer Conv_7, the feature map F1 output by the space pyramid module SPPF is fused at a full-connection layer Concat_4; the feature map output by the full connection layer concat_4 is subjected to feature extraction through a C2f_8 module.

In the scheme, the neck network is mainly responsible for finely extracting the characteristics and fusing the characteristic diagrams with different scales, wherein the characteristic diagrams amplify the size through an up-sampling layer Upsample so as to carry out fine target detection; fusing the feature graphs with different scales through a full-connection layer Concat to acquire richer feature information; and further extracting and integrating the features of the fused feature images through a C2f module.

In the step 2, the header network includes a target detection module Delect_1, a target detection module Delect_2, and a target detection module Delect_3;

the target detection module Delect_1 carries out target detection on the characteristics extracted by the C2f_6 module in the Neck network Neck, and a target 1 is output; the target detection module Delect_2 carries out target detection on the features extracted by the C2f_7 module, and a target 2 is output; the target detection module Delect_3 carries out target detection on the features extracted by the C2f_8 module, and outputs a target Delect_3; and finally, fusing the target 1, the target 2 and the target 3, and outputting a final insulator target.

In the above scheme, the head network generates the detection result of the target according to the feature extracted by the C2f module through the target detection module Delect, where the target detection module Delect includes a convolution layer and a Sigmoid activation function, and can convert the feature into the category and location information of the target.

The global attention module GAM is implemented by a global attention mechanism, and the global attention module GAM comprises a spatial attention module;

the spatial attention module is realized by a self-attention mechanism, processes the input feature map through a 1×1 convolution operation, and generates a query, a key and a value:

wherein Q is a query, K is a bond, and V is a value; x is an input feature map; w (W) _q 、W _k 、W _v Respectively a weight matrix of inquiry, key and value;

by row matrix multiplication of the query and transpose of the key, an attention score matrix is obtained:

wherein S is an attention score; k (K) ^T Is the transpose of K;

applying a softmax function to the scores of each row such that the sum of the scores of each row is 1, resulting in a attention weight matrix:

wherein A is attention weight; softmax is a Softmax function for converting a real vector into a function of probability distribution, i.e. letting all elements be between 0-1 and the sum of all elements be 1, formulated asWherein z= (z) ₁ ,...,z _K ) Is a real vector of K dimensions, < >>Representation of0 the j element of the vector after Softmax function conversion;

matrix multiplication is carried out on the values and the attention weight matrix, and a weighted feature diagram is obtained:

wherein O is a weighted sum;

multiplying by an adaptively learned parameter gamma, and adding the input feature map, the spatial attention module finally outputs a spatial attention profile:

wherein,is a parameter gamma which can be learned; f (F) _sa Is a spatial attention profile.

In the above scheme, the spatial attention module is realized by a self-attention mechanism, the self-attention mechanism is introduced to capture long-distance dependency relationship in the feature map, in the traditional convolutional neural network, information of one pixel point can only be transferred through a neighborhood of convolution kernel size, and the self-attention mechanism can enable the information of one pixel point to be transferred to any position of the feature map, so that more global information is captured. The parameter gamma plays a role in regulation, is a learnable parameter, and can be automatically regulated in the training process to control the influence degree of the self-attention mechanism on the final spatial attention distribution map. The self-attention mechanism has the main advantages that long-distance dependency in the feature map can be captured, in a traditional convolutional neural network, information of one pixel point can be transferred only through a neighborhood with the size of a convolution kernel, and the self-attention mechanism can transfer the information of one pixel point to any position of the feature map, so that more global information is captured.

The global attention module GAM further includes a channel attention module;

the channel attention module transposes and remodels the input feature map, and then performs nonlinear transformation through a first full connection layer and a ReLU activation function; the channel attention profile is then obtained by the second fully connected layer and Sigmoid activation function:

wherein F is _ca Representing a channel attention profile;activating a function for Sigmoid, and compressing the value of the feature map to be between 0 and 1; w (W) ₁ Weight of the first full connection layer, W ₂ For the second whole the weight of the connection layer; reLU is a ReLU activation function used to enhance the nonlinearity of the model, and the formula is [ f (x) =max (0, x)]。

In the above scheme, two fully connected layers can provide more channel associated information.

The forward propagation process at the global attention module GAM is:

wherein Y represents a feature map of forward propagation output;multiplying the corresponding elements; residual is the Residual connection.

In the above scheme, the spatial attention profile and the channel attention profile are multiplied by the feature map input to the global attention module GAM element by element, so that the feature intensity of each position in the original feature map can be adjusted; a residual connection is then added to preserve some important information of the original feature map, preventing loss in the self-attention mechanism. The residual connection ensures that the performance of the network is at least not degraded by introducing a short circuit mechanism so that the input can be passed directly to the output.

The loss function diou_max of the YOLOv8 model is:

wherein IoU represents the Area of the intersection region of the prediction frame and the real frame divided by the Area of the Union region of the prediction frame and the real frame, area of overlay represents the Area of the intersection region of the prediction frame and the real frame, area of Union represents the Union region Area of the prediction frame and the real frame;representing the square of the distance between the predicted frame center point and the true frame center on the x-axis, +.>Representing the square of the distance between the predicted frame center point and the true frame center on the y-axis; b1 b1 _x1 To predict the coordinates of frame left Bian Dingdian on the x-axis, b1 _x2 B1 for predicting the coordinates of the right vertex of the frame on the x-axis _y1 B1 for predicting the coordinates of the vertex of the upper edge of the frame on the y-axis _y2 The coordinates of the vertex of the lower edge of the prediction frame on the y axis are obtained; b2 b2 _x1 2 is the coordinate of the true frame left Bian Dingdian on the x-axis _x2 B2 is the coordinate of the right vertex of the real frame on the x axis _y1 B2 is the coordinate of the vertex of the upper edge of the real frame on the y axis _y2 For the vertex of the lower edge of the real frame on the y axisCoordinates; c represents the diagonal length of the minimum closed matrix containing the predicted and real frames; l_ { DIoU_max } is the loss function of the YOLOv8 model.

In the scheme, not only the distance between the center points of the predicted frame and the real frame is considered, but also the diagonal length of the minimum closed frame containing the predicted frame and the real frame is considered, and the DIoU_max can better process the overlapping height of the predicted frame and the real frame, so that the performance of the YOLOv8 model is improved. Multiplying the coefficient 1.3 in the DIoU_max loss function can emphasize rewards for high overlap prediction, and the strategy effectively solves the problem of class imbalance, so that the accuracy of the YOLOv8 model in predicting positive samples is improved, and the overall detection performance is improved.

Compared with the prior art, the invention has the beneficial effects that:

the invention utilizes an improved object detection algorithm YOLOv8 model, firstly, preprocessing the real shot image, enhancing the number of the image and improving the generalization capability, and then training the YOLOv8 model by using the image to enable the model to identify possible insulator positions. Meanwhile, an attention mechanism and a bounding box regression loss function are introduced to the YOLOv8 model, so that the model can pay more attention to targets in images and balance positive and negative samples, and the detection index is improved.

Experimental results show that the performance of the method in insulator detection task is superior to that of the traditional method, and the method has no great influence on the aspects of training difficulty and model size, therefore, the invention provides an effective scheme for insulator detection in the power system.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of a backbone network in the YOLOv8 model of the present invention;

FIG. 3 is a schematic diagram of a network structure of a neck network and a head network in a YOLOv8 model of the present invention;

FIG. 4 is a schematic diagram of a global attention module in a backbone network according to the present invention;

FIG. 5 is a schematic diagram of a network structure of a spatial attention module in a global attention module according to the present invention;

fig. 6 is a schematic diagram of a network structure of the C2f module of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Also, in the description of the present invention, the terms "first," "second," and the like are used merely to distinguish one from another, and are not to be construed as indicating or implying a relative importance or implying any actual such relationship or order between such entities or operations. In addition, the terms "connected," "coupled," and the like may be used to denote a direct connection between elements, or an indirect connection via other elements.

Example 1

The invention is realized by the following technical scheme, as shown in fig. 1, an insulator detection method based on a target detection algorithm and an attention mechanism comprises the following steps:

and step 1, acquiring real shot images of a plurality of power supply stations, and forming a training chart set after image preprocessing.

As an implementation manner, 1124 real-shot images of a certain power supply station 110V and 220V are collected, and the real-shot images are scaled to be 928 x 512 in size, which has the advantage of maximally preserving the information of the original images. The adjusted image is then subjected to a series of preprocessing operations including mirror-inversion, addition of noise squares of random size, color conversion, etc. The noise square blocks with random sizes are added to simulate various interferences in an actual environment, so that the model is more robust, and the color transformation can make the model insensitive to the color distribution of an image, thereby improving the generalization capability of the model. These operations are intended to allow the model to accommodate insulator detection under various environmental conditions, and UI color is insensitive. The preprocessing operation is helpful for improving the generalization capability of the model, so that the model can show good detection effect in various practical application scenes. After the image preprocessing, the number of images is increased to 2248 sheets, and a training image set is formed after the insulators in the images are manually marked.

Step 2, inputting a training atlas into a YOLOv8 model for training to obtain a trained insulator prediction model; the YOLOv8 model comprises a backbone network, a neck network and a head network; the main network comprises a multi-scale convolution module, a C2f module and a global attention module; the training atlas is subjected to feature image reduction and channel quantity amplification through a multi-scale convolution module, features are extracted through a C2f module, and global information is captured through a global attention module; after re-extracting and integrating the features via the neck network, the head network outputs the detected target.

After the mainstream target detection algorithm is evaluated, the method selects the YOLOv8 model, and finally decides to train by using the improved YOLOv8 model in consideration of the limitation of training resources and edge AI. The improved YOLOv8 model is mainly characterized in that the depth and the width of the model are smaller, and the maximum channel number is 1024, so that the size and the complexity of the model are lower, and the model is suitable for unmanned aerial vehicles with limited computing capacity and storage capacity.

Referring to fig. 2 and 3, the YOLOv8 model modified by the present invention includes a Backbone network Backbone, a Neck network neg, and a Head network Head.

Referring to fig. 2, the Backbone network backhaul is mainly responsible for extracting features of an image, and includes a plurality of convolution layers Conv with different scales, a plurality of C2f modules, a global attention module GAM, and a spatial pyramid module SPPF. In detail, referring to fig. 2, the Backbone network backhaul includes a convolution layer conv_1, a convolution layer conv_2, a convolution layer c2f_1, a convolution layer conv_3, a convolution layer c2f_2, a convolution layer conv_4, a convolution layer c2f_3, a convolution layer conv_5, a convolution layer c2f_4, a global attention module GAM, and a spatial pyramid module SPPF, which are sequentially connected. The scales of the 5 convolution layers Conv are P1/2, P2/4, P3/8, P4/16 and P5/32 respectively, and the convolution layers Conv can reduce the feature map and increase the channels.

In fig. 2, conv_1 is shorthand for a convolutional layer conv_1, c2f_1 is shorthand for a c2f_1 module, and tail "_1", "_2", etc. are only used for convenience of distinction, for example, the structures of the convolutional layers conv_1 and conv_2 are identical, for example, the structures of the c2f_1 module and the c2f_2 module are identical, and the other is the same; GAM is a shorthand for the global attention module GAM, the SPPF is shorthand of a spatial pyramid module SPPF; f1 is a feature map output by the spatial pyramid module SPPF, F2 is a feature map output by the c2f_3 module, and F3 is a feature map output by the c2f_2 module.

The training atlas gradually reduces the size of the image through the convolution layers Conv_1 and Conv_2, and increases the number of channels at the same time so as to extract richer features; the feature is further extracted and integrated through the C2f_1 module, and the C2f_1 module comprises a convolution layer and a residual layer which are connected, so that the richer feature can be extracted, and meanwhile, the information of the original feature is reserved. The Conv and the C2f modules of the subsequent convolution layers are the same principle, and finally the feature map output by the C2f_4 module is realized by a global attention module GAM and a space pyramid module SPPF under the combined action of the global attention module GAM, so that the fusion of multi-scale features is realized, and the complex relationship between the global space information of the feature map and different channels is successfully captured.

The C2f module (CSP Bottleneck with relations) comprises a convolution layer and a plurality of residual layers, please refer to fig. 6, which is a network structure diagram of the C2f module, wherein the input features are processed by a convolution layer ConvBNSiLU first, and then the processed features are divided into two parts by Split; one part is directly transmitted, and the other part is further extracted through n BottleNeck structures; finally, the features of the two parts are fused and output through a Concat operation (the dotted line in fig. 6 is the input of the Concat).

In fig. 6, bottlenegk_1 is a shorthand of the 1 st bottlenegk structure, bottlenegk_n is a shorthand of the n-th bottlenegk structure, and the tail "_1", "_2", etc. are only used for convenience of distinction, and the internal structures of the bottlenegk_1 and the bottlenegk_2 are identical. The BottleNeck structure is used to perform the 1*1 convolution operation.

The Neck network Neck is mainly responsible for finely extracting features and fusing feature graphs with different scales, and comprises a plurality of convolution layers Conv, a plurality of up-sampling layers Upsample, a plurality of C2f modules and a plurality of full-connection layers Concat, wherein the feature graphs amplify the size through the up-sampling layers Upsample so as to perform fine target detection; fusing the feature graphs with different scales through a full-connection layer Concat to acquire richer feature information; and further extracting and integrating the features of the fused feature images through a C2f module. In detail, referring to fig. 3, the Neck network Neck includes a convolutional layer conv_6, a convolutional layer conv_7, an upsampling layer upsample_1, an upsampling layer upsample_2, a c2f_5 module, a c2f_6 module, a c2f_7 module, a c2f_8 module, a fully connected layer concat_1, a fully connected layer concat_2, a fully connected layer concat_3, and a fully connected layer concat_4.

After the feature map F1 output by the space pyramid module SPPF in the Backbone network back passes through the up-sampling layer Upsample_1, the feature map F1 output by the space pyramid module SPPF is fused with the feature map F2 output by the C2f_3 module at the full-connection layer Concat_1; the feature map output by the full-connection layer Concat_1 is subjected to feature extraction through a C2f_5 module and is subjected to up-sampling layer Upsample_2, and then is fused with the feature map F3 output by the C2f_2 module at the full-connection layer Concat_2; the feature map output by the full-connection layer Concat_2 is subjected to feature extraction through a C2f_6 module and is fused with the feature map output by the C2f_5 module at the full-connection layer Concat_3 after the feature map passes through a convolution layer Conv_6; the feature map output by the full-connection layer Concat_3 is subjected to feature extraction through a C2f_7 module and is subjected to convolution layer Conv_7, and then is fused with the feature map F1 output by the space pyramid module SPPF at the full-connection layer Concat_4; the feature map output by the full connection layer concat_4 is subjected to feature extraction through a C2f_8 module.

In fig. 3, concat_1 is a shorthand of the full connection layer concat_1, upsample_1 is a shorthand of the upsampling layer upsample_1, c2f_5 is a shorthand of the c2f_5 module, and tail "_1", "_2", etc. are only used for convenience of distinction, for example, the structures of the full connection layer concat_1 and the full connection layer concat_2 are completely the same, and then, for example, the structures of the upsampling layer upsample_1 and the upsampling layer upsample_2 are completely the same, and other similarities are the same.

The Head network Head is mainly responsible for performing target detection according to features extracted by the Neck network Neck, please continue to refer to fig. 3, and includes 3 target detection modules, namely a target detection module delect_1, a target detection module delect_2, and a target detection module delect_3.

In fig. 3, delete_1 is a shorthand for the target detection module delete_1, delete_2 is a shorthand for the target detection module delete_2, and delete_3 is a shorthand for the target detection module delete_3.

The Head network Head generates a detection result of the target according to the characteristics extracted by the C2f module through a target detection module Delect, wherein the target detection module Delect comprises a convolution layer and a Sigmoid activation function, and can convert the characteristics into category and position information of the target.

Referring to fig. 4, the global attention module GAM in the Backbone network backhaul includes a spatial attention module and a channel attention module. Referring to fig. 5, the spatial attention module is implemented by a self-attention mechanism, and firstly, an input feature map is processed through a 1×1 convolution operation to generate a Query (Query), a Key (Key) and a Value (Value); then, carrying out row matrix multiplication on the inquiry and the transposition of the key to obtain an attention score matrix; then, applying a Softmax function to the scores of each row so that the sum of the scores of each row is 1, thereby obtaining an attention weight matrix; finally, the value and the attention weight matrix are subjected to matrix multiplication to obtain a weighted feature map, then the weighted feature map is multiplied by an adaptive learning parameter gamma, and the input feature map is added, so that the spatial attention module finally outputs a spatial attention distribution map.

In fig. 5, 1×1conv represents a 1×1 convolution operation, and Softmax represents a Softmax function.

The parameter gamma plays a role in regulation, is a learnable parameter, and can be automatically regulated in the training process to control the influence degree of the self-attention mechanism on the final spatial attention distribution map. The self-attention mechanism has the main advantages that long-distance dependency in the feature map can be captured, in a traditional convolutional neural network, information of one pixel point can be transferred only through a neighborhood with the size of a convolution kernel, and the self-attention mechanism can transfer the information of one pixel point to any position of the feature map, so that more global information is captured.

The operation flow of the spatial attention module is as follows:

wherein Q is Query, K is Key, and V is Value; x is an input feature map; w (W) _q 、W _k 、W _v Respectively a weight matrix of inquiry, key and value; s is an attention score; k (K) ^T Is the transpose of K; a is attention weight; softmax is a Softmax function for converting a real vector into a function of probability distribution, i.e. letting all elements be between 0-1 and the sum of all elements be 1, formulated asWherein z= (z) ₁ ,...,z _K ) Is a real vector of K dimensions, < >>Representing the j-th element of the vector z after being converted by a Softmax function; o is a weighted sum; />Is a parameter gamma which can be learned; f (F) _sa Is a spatial attention profile.

With continued reference to fig. 4, the channel attention module transposes and reshapes the input feature map and then performs a nonlinear transformation through the first full connection layer and the ReLU activation function; the channel attention profile is then derived by the second fully connected layer and Sigmoid activation function. Two fully connected layers may provide more channels of association information.

In fig. 4, reLU represents a ReLU activation function, sigmoid represents a Sigmoid activation function.

The operation flow of the channel attention module is as follows:

wherein F is _ca Representing a channel attention profile;activating a function for Sigmoid, wherein the function is used for compressing the value of the feature map to be between 0 and 1 and representing the weight of attention; w (W) ₁ Weight of the first full connection layer, W ₂ Weights for the second fully connected layer; reLU is a ReLU activation function used to enhance the nonlinearity of the model, and the formula is [ f (x) =max (0, x)]。

During the forward propagation of the global attention module GAM, the spatial attention profile and the channel attention profile are multiplied element by element with the feature map input to the global attention module GAM, so that the feature intensity of each position in the original feature map can be adjusted; a residual connection is then added to preserve some important information of the original feature map, preventing loss in the self-attention mechanism. The residual connection ensures that the performance of the network is at least not degraded by introducing a short circuit mechanism so that the input can be passed directly to the output.

The operation flow of forward propagation is as follows:

In the training process of the YOLOv8 model, a diou_max loss function is introduced, which is an improved IoU loss function, and after improvement, the distance between the predicted frame and the center point of the real frame is considered, and the diagonal length of the minimum closed frame comprising the predicted frame and the real frame is also considered. Specifically, the scheme calculates the distance between the center points of the predicted frame and the real frame on the x axis and the y axis, then takes the maximum value of the distance, divides the maximum value by the length of the diagonal line of the minimum closed frame, and obtains a value for measuring the offset of the center point; finally, this value is subtracted from IoU to yield diou_max. Diou_max is better able to handle the overlap height of those predicted and real frames than conventional IoU, with a larger position offset, thereby improving the performance of the YOLOv8 model.

The diou_max loss function is:

wherein IoU represents the Area of the intersection region of the prediction frame and the real frame divided by the Area of the Union region of the prediction frame and the real frame, area of overlay represents the Area of the intersection region of the prediction frame and the real frame, area of Union represents the Union region Area of the prediction frame and the real frame;representing the square of the distance between the predicted frame center point and the true frame center on the x-axis, +.>Representing the square of the distance between the predicted frame center point and the true frame center on the y-axis; b1 b1 _x1 To predict the coordinates of frame left Bian Dingdian on the x-axis, b1 _x2 B1 for predicting the coordinates of the right vertex of the frame on the x-axis _y1 B1 for predicting the coordinates of the vertex of the upper edge of the frame on the y-axis _y2 The coordinates of the vertex of the lower edge of the prediction frame on the y axis are obtained; b2 b2 _x1 2 is the coordinate of the true frame left Bian Dingdian on the x-axis _x2 B2 is the coordinate of the right vertex of the real frame on the x axis _y1 B2 is the coordinate of the vertex of the upper edge of the real frame on the y axis _y2 The coordinates of the vertex of the lower edge of the real frame on the y axis; c represents the diagonal length of the minimum closed matrix containing the predicted and real frames; l_ { DIoU_max } is the loss function of the YOLOv8 model.

In a training set of images, where the number of pixels of the negative sample (i.e., background or non-target object) is much greater than that of the positive sample (i.e., target object), this imbalance may cause the YOLOv8 model to bias toward predicting the negative sample, thereby disregarding the positive sample and reducing the accuracy of the detection. Therefore, the improved DIoU_max loss function multiplied by the coefficient of 1.3 can emphasize rewards for high overlap prediction, and the strategy effectively solves the problem of unbalanced categories, so that the accuracy of the YOLOv8 model in predicting positive samples is improved, and the overall detection performance is improved.

Example 2

In this example, experimental verification is performed on the basis of the above-mentioned example 1, and for the performance comparison of the YOLOv8 model of the present invention with other conventional models, accuracy comparison is performed in five dimensions of accuracy, recall, F1, mAP 50-95. In addition, as the YOLOv8 model aims at reducing the size and training difficulty of the model, the model is also compared on four indexes of training video memory, weight size, training speed and prediction speed.

The calculation formula of F1 is as follows:

precision represents precision, and recovery represents recall; the formula of precision is expressed as:

wherein True Positive represents the number of samples that are actually Positive and predicted Positive, false Positive represents the number of samples that are actually negative but predicted Positive;

the calculation formula of the recall rate recovery is as follows:

wherein False positive represents the number of samples that are actually Negative and predicted Negative.

mAP50: ioU is greater than 0.5.

mAP50-95: ioU threshold values are from 0.5 to 0.95, with values taken every 0.05, the average value of the mAP below these threshold values.

Table 1 is a comparison of the performance of each model, optimizer sgd, mixing accuracy, loss function IoU (Yolov 8 of the present invention is a modified DIoU_max loss function), batch size 16, training times 300, image size 300 x 300.

Table 1 comparison of the performance of the various models

In table 1, fasterRCNN is a fast target detection algorithm based on convolutional neural network, SSD is a single-stage target detector, YOLOv8 is an image segmentation model, and the scheme of the invention is improved based on YOLOv 8.

Wherein FasterRCNN, SSD was not tested for mAP 50-95. Obviously, compared with FasterRCNN, SSD, the YOLOv8 has obvious improvement on various indexes, and when the improved YOLOv8 of the invention introduces a self-attention mechanism, the recall rate is slightly reduced by 0.05%, and the accuracy, mAP50 and mAP50-95 are improved.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The insulator detection method based on the target detection algorithm and the attention mechanism is characterized by comprising the following steps of: the method comprises the following steps:

step 2, inputting a training atlas into a YOLOv8 model for training to obtain a trained insulator prediction model; the YOLOv8 model comprises a backbone network, a neck network and a head network; the main network comprises a multi-scale convolution module, a C2f module and a global attention module; the training atlas is subjected to feature image reduction and channel quantity amplification through a multi-scale convolution module, features are extracted through a C2f module, and global information is captured through a global attention module; after extracting and integrating the features again through the neck network, outputting a detected target by the head network;

in the step 2, the backbone network includes a convolutional layer conv_1, a convolutional layer conv_2, a c2f_1 module, a convolutional layer conv_3, a c2f_2 module, a convolutional layer conv_4, a c2f_3 module, a convolutional layer conv_5, a c2f_4 module, a global attention module GAM, and a spatial pyramid module SPPF, which are sequentially connected; the scales of the 5 convolution layers Conv are P1/2, P2/4, P3/8, P4/16 and P5/32 respectively;

wherein S is an attention score; k (K) ^T Is the transpose of K;

wherein A is attention weight; softmax is a Softmax function for converting a real vector into a function of probability distribution, i.e. letting all elements be between 0-1 and the sum of all elements be 1, formulated asWherein z= (z) ₁ ,...,z _K ) Is a real vector of K dimensions, < >>Representing the j-th element of the vector z after being converted by a Softmax function;

wherein O is a weighted sum;

wherein,is a parameter gamma which can be learned; f (F) _sa Is a spatial attention profile;

the global attention module GAM further includes a channel attention module;

wherein F is _ca Representing a channel attention profile;activating a function for Sigmoid, and compressing the value of the feature map to be between 0 and 1; w (W) ₁ Weight of the first full connection layer, W ₂ Weights for the second fully connected layer; reLU is a ReLU activation function used to augment non-modelThe linearity is given by [ f (x) =max (0, x)]；

The forward propagation process at the global attention module GAM is:

wherein Y represents a feature map of forward propagation output;multiplying the corresponding elements; residual is the Residual connection;

2. The insulator detection method based on the target detection algorithm and the attention mechanism according to claim 1, wherein: in the step 2, the neck network includes a convolutional layer conv_6, a convolutional layer conv_7, an up-sampling layer upsample_1, an up-sampling layer upsample_2, a c2f_5, a c2f_6, a c2f_7, a c2f_8, a full connection layer concat_1, a full connection layer concat_2, a full connection layer concat_3, and a full connection layer concat_4;

after the feature map F1 output by the space pyramid module SPPF passes through the up-sampling layer Upsample_1, the feature map F1 is fused with the feature map F2 output by the C2f_3 module at the full-connection layer Concat_1; the feature map output by the full-connection layer Concat_1 is subjected to feature extraction through a C2f_5 module and is subjected to up-sampling layer Upsample_2, and then is fused with the feature map F3 output by the C2f_2 module at the full-connection layer Concat_2; the feature map output by the full-connection layer Concat_2 is subjected to feature extraction through a C2f_6 module and is fused with the feature map output by the C2f_5 module at the full-connection layer Concat_3 after the feature map passes through a convolution layer Conv_6; the feature map output by the full-connection layer Concat_3 is subjected to feature extraction through a C2f_7 module and is subjected to convolution layer Conv_7, and then is fused with the feature map F1 output by the space pyramid module SPPF at the full-connection layer Concat_4; the feature map output by the full connection layer concat_4 is subjected to feature extraction through a C2f_8 module.

3. The insulator detection method based on the target detection algorithm and the attention mechanism according to claim 2, wherein: in the step 2, the header network includes a target detection module Delect_1, a target detection module Delect_2, and a target detection module Delect_3;

4. The insulator detection method based on the target detection algorithm and the attention mechanism according to claim 1, wherein: the loss function diou_max of the YOLOv8 model is:

wherein IoU denotes an Area of an intersection region of the prediction frame and the real frame divided by an Area of a union region of the prediction frame and the real frame, areaof overlay represents the Area of the intersection Area of the prediction frame and the real frame, and Area of Union represents the Area of the Union Area of the prediction frame and the real frame;representing the square of the distance in the x-axis between the predicted box center point and the true box center,representing the square of the distance between the predicted frame center point and the true frame center on the y-axis; b1 b1 _x1 To predict the coordinates of frame left Bian Dingdian on the x-axis, b1 _x2 B1 for predicting the coordinates of the right vertex of the frame on the x-axis _y1 B1 for predicting the coordinates of the vertex of the upper edge of the frame on the y-axis _y2 The coordinates of the vertex of the lower edge of the prediction frame on the y axis are obtained; b2 b2 _x1 2 is the coordinate of the true frame left Bian Dingdian on the x-axis _x2 B2 is the coordinate of the right vertex of the real frame on the x axis _y1 B2 is the coordinate of the vertex of the upper edge of the real frame on the y axis _y2 The coordinates of the vertex of the lower edge of the real frame on the y axis; c represents the diagonal length of the minimum closed matrix containing the predicted and real frames; l_ { DIoU_max } is the loss function of the YOLOv8 model.

5. The insulator detection method based on the target detection algorithm and the attention mechanism according to claim 1, wherein: the specific steps of the step 1 include: collecting a real shot image of a power supply station, scaling the real shot image in equal proportion, carrying out mirror image overturning on the image, adding noise square blocks with random sizes and preprocessing of color conversion, and carrying out manual labeling on insulators in the image to form a training image set.