CN117392568A - Method for unmanned aerial vehicle inspection of power transformation equipment in complex scene - Google Patents

Method for unmanned aerial vehicle inspection of power transformation equipment in complex scene Download PDF

Info

Publication number
CN117392568A
CN117392568A CN202311381395.XA CN202311381395A CN117392568A CN 117392568 A CN117392568 A CN 117392568A CN 202311381395 A CN202311381395 A CN 202311381395A CN 117392568 A CN117392568 A CN 117392568A
Authority
CN
China
Prior art keywords
feature
power transformation
convolution
output
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311381395.XA
Other languages
Chinese (zh)
Inventor
朱江
范崇高
许海霞
余洪山
张�杰
王昭鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiangtan University
Original Assignee
Xiangtan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangtan University filed Critical Xiangtan University
Priority to CN202311381395.XA priority Critical patent/CN117392568A/en
Publication of CN117392568A publication Critical patent/CN117392568A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/86Arrangements for image or video recognition or understanding using pattern recognition or machine learning using syntactic or structural representations of the image or video pattern, e.g. symbolic string recognition; using graph matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Remote Sensing (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for inspecting substation equipment by an unmanned aerial vehicle in a complex scene, which mainly solves the problems of lower detection precision and more consumption of computing resources in the prior art. The method comprises the steps of collecting infrared images of detected power transformation equipment, inputting the infrared images of the power transformation equipment into a pre-trained infrared image detection model of the power transformation equipment to obtain detection results of the detected power transformation equipment, wherein the infrared image detection model of the power transformation equipment comprises the following steps: a backbone feature extraction network, a multi-scale feature aggregation extraction network, and a detection head network. According to the invention, under the condition of low calculation resource requirement, the detection precision of the power transformation equipment in the aerial infrared image is improved, and the power transformation equipment can be applied to the daily inspection of the running state of the power transformation equipment for the unmanned aerial vehicle autonomous inspection of the power transmission line.

Description

Method for unmanned aerial vehicle inspection of power transformation equipment in complex scene
Technical Field
The invention relates to an infrared image detection method of power transformation equipment, in particular to a method for inspecting the power transformation equipment by an unmanned aerial vehicle in a complex scene.
Background
As an important infrastructure of the power system, the power transformation equipment plays a role of voltage and current transformation, receiving and distributing electric energy. The transformer equipment mainly comprises a lightning arrester, a transformer, an isolating switch and a pillar porcelain bottle. They work outdoors for a long time, have the characteristics of high voltage and large current, local overheat faults easily occur, and the safety and stability of the operation of the power grid are seriously threatened. Therefore, monitoring the heating state of the power transformation equipment is a basic task of daily inspection. Because the power transformation equipment is erected at a high place, the unmanned aerial vehicle is provided with the infrared thermal imaging instrument to acquire aerial images of the power transformation equipment, and the aerial images are an effective means for identifying abnormal heating of the surface of the power transformation equipment. In recent years, many studies have been conducted on the visual task of automatic inspection of power transformation equipment. Different power transformation devices have different surface temperature distribution characteristics. Therefore, in order to realize automatic detection and diagnosis of the heating state of the power transformation device, the power transformation device is first identified and positioned in the infrared image quickly and accurately.
In recent years, with the development of convolutional neural networks, some general object detection networks, such as fast RCNN, SSD and YOLO, have been proposed successively, which achieve significant effects in multi-scale object detection of visible light images. Because of the difference between the infrared image and the visible light image, these general object detection models are difficult to directly apply to detection of power transformation equipment in the infrared image. The researchers correspondingly improve the universal detection model aiming at different application scenes. Although these methods improve the effect of target detection of infrared aerial images of unmanned aerial vehicles to a certain extent, detection of power transformation equipment in infrared aerial images still faces the following challenges:
(1) The spatial distribution of the target is unbalanced. The power transformation equipment tends to concentrate in the middle region of the image under the unmanned aerial vehicle viewing angle. Resulting in dense objects and mutual occlusion in a partial area.
(2) Varying unmanned viewing angles. Under different unmanned aerial vehicle visual angles, the appearance of the same class of power transformation equipment is different; on the other hand, the different classes of power transformation devices are very similar in appearance under a particular unmanned viewing angle. This problem of large intra-category differences and small inter-category differences makes it difficult for the model to effectively distinguish between objects.
(3) A varying target dimension. Due to the change of the flying height, the scale presented by the same category of targets in the unmanned aerial vehicle image changes sharply. Moreover, large and small objects may exist simultaneously in the same image. The model needs to be able to detect targets of varying dimensions.
(4) Limited on-board resources. The unmanned aerial vehicle has very limited computing platform resources on board the unmanned aerial vehicle due to the limitations of bearing, power consumption and cost. The detection model is deployed on a computing platform with insufficient resources, and parameters, computational complexity and reasoning time of the model need to be reduced on the premise of ensuring detection accuracy.
Under the challenges, the existing method cannot achieve the aim of achieving both low calculation resource consumption and rapid and accurate detection of the power transformation equipment from the aerial infrared image.
Disclosure of Invention
The invention aims to solve the technical problems: aiming at the problems in the prior art, the invention provides the unmanned aerial vehicle inspection substation equipment method in a complex scene, which can solve the problem that the infrared image detection precision of the unmanned aerial vehicle inspection substation equipment is not high under the condition that the existing method is difficult to maintain low calculation resource consumption.
In order to solve the technical problems, the invention adopts the following technical scheme:
the utility model provides an unmanned aerial vehicle inspection substation equipment method based on under complex scene, includes gathering the substation equipment infrared image that is detected, and the substation equipment infrared image input is trained in advance to substation equipment infrared image detection model and is obtained the testing result of substation equipment that is detected, substation equipment infrared image detection model includes:
the trunk feature extraction network is used for extracting features of different levels from the input infrared images of the power transformation equipment; the main feature extraction network GSCSPLnet adopts a ten-layer structure, wherein the first layer is a Focus module, the second, fourth, sixth and eighth layers are GSConv_BN_SiLU modules, the third, fifth, seventh and tenth layers are GSCSP-L modules, and the ninth layer is a multi-level receptive field feature enhancement module MRFFEM; the GSCSP-L module is formed by sequentially cascading a GSCSP and a long-distance characteristic capture attention mechanism LDFC, wherein the output of the GSCSP is connected to the input of the LDFC, and the output of the LDFC is the output of the GSCSP-L module; the GSConv_BN_SiLU module is formed by sequentially cascading a lightweight convolution GSConv, batchNorm2d and an activation function SiLU; the GSCSP adopts a CSPNet basic structure, each GSCSP consists of X GSResblock and a plurality of GSConv, the GSResblock consists of a main branch and a residual branch, the main branch consists of two GSConv_BN_SiLU modules which are connected in series, and the residual branch is a depth separable convolution DWSConv, batchNorm d and an activation function SiLU which are connected in series in sequence; the feature graphs extracted by the third layer, the fifth layer, the sixth layer, the seventh layer, the eighth layer and the tenth layer of the trunk feature extraction network GSCSPLnet are respectively marked as C2, C3, C4', C4, C5' and C5; the input of the multi-level receptive field feature enhancement module MRFFEM is feature graphs C4 'and C5' extracted by a trunk feature extraction network, and the output of the MRFFEM is subjected to GSCSP-L to obtain C5;
the multi-scale feature aggregation network MFEN is used for carrying out multi-scale multi-level feature aggregation on the feature graphs C2, C3, C4 and C5 extracted by the trunk feature extraction network to obtain feature graphs P3, P4 and P5;
and the detection head network is used for detecting the characteristics P3, P4 and P5 after the multi-scale characteristic aggregation to obtain the detection result of the power transformation equipment.
Optionally, the long-distance feature capture attention mechanism LDFC comprises the following steps: firstly, an average pooling layer, a convolution operation of 1 multiplied by 1 and a linear activation function are sequentially adopted for an input characteristic diagram Z to obtain the characteristic diagramAnd->Then, the feature map->Adaptive pooling of input into vertical directionLayer, central feature aggregation layer and convolution of 1 x 1, capturing long distance spatial features in the vertical direction +.>Feature map +.>The convolution of the adaptive pooling layer, the central feature aggregation layer and 1×1 input to the horizontal direction captures the long distance spatial feature in the horizontal direction>Finally, through Sigmoid function and up-sampling operationAnd->Conversion to global space-learnable weights alpha H (h) And alpha H (w)。
Optionally, the multi-level receptive field feature enhancement module MRFFEM comprises an auxiliary layer positioning feature enhancement block ALFEB and a deep semantic pyramid pooling block DSPPB; the ALFEB is provided with an input, three data processing branches and an output, wherein the input of the ALFEB is a characteristic graph C4', the data processing branch I of the ALFEB is a1×3 space-separable convolution, a 3×1 space-separable convolution, a space-separable convolution with a convolution kernel size of 3×03 expansion rate of 3 and a SimaM attention, the branch II is a1×13 space-separable convolution, a 3×21 space-separable convolution, a1×33 space-separable convolution, a 3×41 space-separable convolution, a convolution kernel size of 3×3 expansion rate of 5 and a SimaM attention, and the branch III is a1×3 space-separable convolution, a 3×1 space-separable convolution, a convolution kernel size of 3×3 expansion volume and a SimaM attention, wherein the space-separable convolution kernel size of 7 is sequentially cascaded; finally, the output of the three data processing branches is adoptedAggregation is performed by splicing operation, and output f is obtained after 1 multiplied by 1 convolution 1 The method comprises the steps of carrying out a first treatment on the surface of the The DSPPB comprises an input, three data processing branches and an output, wherein the input of the DSPPB is a characteristic diagram C5', the first data processing branch of the DSPPB is a 1X 1 GSConv, a self-adaptive pooling layer, a convolution kernel which is a 3X 3 expansion rate 1 expansion volume and a SimAM attention which are sequentially cascaded, the second data processing branch is a 3X 3 GSConv, a self-adaptive pooling layer, a convolution kernel which is a 3X 3 expansion rate 3 expansion volume and a SimAM attention which are sequentially cascaded, and the third data processing branch is a two 3X 3 GSConv, a self-adaptive pooling layer and a convolution kernel which are sequentially cascaded, wherein the expansion volume and the SimAM attention which are 3X 3 expansion rates 5; finally, the outputs of the three data processing branches are aggregated by adopting splicing operation, and the output f is obtained by 1X 1 convolution 2 The method comprises the steps of carrying out a first treatment on the surface of the Will f 1 And f 2 And (3) performing splicing operation, and performing 1×1 convolution to obtain a final output.
Optionally, the multi-scale feature aggregation network MFEN is composed of three stacked fusion and reassembly feature aggregation blocks CARE and one class fusion and reassembly feature aggregation block SCARE; the input of the first CARE is feature graphs C5, C4 and C3, the feature graph N3 with the same scale as the feature graph C4 is obtained through downsampling of the C3, the feature graph L5 with the same scale as the feature graph C4 is obtained through upsampling of the C5, the N3, the C4 and the L5 are spliced, and then the output P4_1 is obtained through GSCSP-L; the second CARE is input into P4_1, C3 and C2, the C2 is subjected to downsampling to obtain a feature diagram N2 with the same size as the C3, the P4_1 is subjected to upsampling to obtain a feature diagram L4_1 with the same size as the C3, the N2, the C3 and the L4_1 are spliced, and the output P3 is obtained through GSCSP-L; the third CARE is input into C5, P4_1 and P3, P3 is subjected to downsampling to obtain a feature diagram N3_1 with the same scale as that of P4_1, C5 is subjected to upsampling to obtain a feature diagram L5_1 with the same scale as that of P4_1, N3_1, P4_1 and L5_1 are spliced, and then GSCSP-L is carried out to obtain an output P4; the SCARE is input into C5 and P4, the C5 obtains a feature diagram S5 with the same size as the C5 through a feature auxiliary module AFSB, the P4 obtains a feature diagram N4 with the same size as the C5 through downsampling, the S5, the N4 and the C5 are spliced, and the output P5 is obtained through GSCSP-L; and finally, P3, P4 and P5 are respectively output to a first detection head, a second detection head and a third detection head of the detection head network.
Optionally, the feature auxiliary module AFSB is composed of a main branch and a residual branch; the main branch comprises two branches, wherein the first branch is a convolution and Softmax function of 1X 1 which are sequentially cascaded, the second branch is a self-adaptive pooling layer, the outputs of the two branches are multiplied, and then the outputs of the main branch are obtained through the convolution of 1X 1, layerNorm and an activation function ReLU; and finally, splicing the output of the main branch with the residual branch to obtain the output of the AFSB.
Optionally, the detection Head module network includes a first detection Head yolo_head1, a second detection Head yolo_head2, and a third detection Head yolo_head3; the first detection Head yolo_head1 detects the characteristic diagram with the size of 80 multiplied by 80 to obtain a detection result of the power transformation equipment; the second detection Head yolo_head2 detects the characteristic diagram with the size of 40 multiplied by 40 to obtain a detection result of the power transformation equipment; and the third detection Head yolo_head3 detects the feature map with the size of 20 multiplied by 20 to obtain the detection result of the power transformation equipment.
Optionally, before the step of inputting the infrared image of the power transformation device into the pre-trained infrared image detection model of the power transformation device, the method further comprises the step of training the infrared image detection model of the power transformation device:
s1) acquiring an infrared image sample and attaching a label to construct an infrared image data set of the transformer equipment;
s2) constructing an infrared image detection model of the power transformation equipment;
s3) training the infrared image detection model of the power transformation equipment by adopting preset training parameters, and finally obtaining the pre-trained infrared image detection model of the power transformation equipment.
Optionally, the preset training parameters are standard SGD optimizer training models with initial learning rate of 0.01 and momentum of 0.937, and weight attenuation is always set to 5×10 -4 A total of 400 batches were trained.
In addition, the invention also provides a method for inspecting the substation equipment of the unmanned aerial vehicle based on the complex scene, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the method for detecting the infrared image of the substation equipment based on the deep learning.
Furthermore, the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program is used for being executed by computer equipment to implement the steps of the unmanned aerial vehicle inspection substation equipment method under the complex scene.
Compared with the prior art, the invention has the following advantages: the invention improves the detection precision of the power transformation equipment in the aerial infrared image of the unmanned aerial vehicle under the condition of low calculation resource consumption, and can be applied to the daily inspection of the running state of the power transformation equipment for the autonomous inspection of the unmanned aerial vehicle of a transformer substation and an electric line.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a schematic diagram of a network structure of an infrared image detection model of a power transformation device according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the structure of GSCSP-L according to an embodiment of the invention;
FIG. 4 is a schematic diagram of the structure of the attention of an LDFC in an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a multi-level receptive field feature enhancement module MRFFEM in an embodiment of the invention.
Detailed Description
The embodiment provides a method for inspecting power transformation equipment by an unmanned aerial vehicle in a complex scene, the flow is shown in fig. 1, the method comprises the steps of collecting infrared images of the detected power transformation equipment, constructing an infrared image detection model of the power transformation equipment, pre-training the infrared image detection model of the power transformation equipment, inputting the infrared images of the power transformation equipment into the pre-trained infrared image detection model of the power transformation equipment to obtain a detection result of the detected power transformation equipment, and the specific steps are as follows:
s1) acquiring an infrared image sample and attaching a label to construct an infrared image data set of the transformer equipment; in this embodiment, an infrared image dataset of the power transformation device is constructed, and the dataset includes seven types of power transformation devices: arrester 1 (PB 1), arrester 2 (PB 2), current transformer 1 (TA 1), current transformer 2 (TA 2), voltage Transformer (TV), disconnecting Switch (DS) and pillar Porcelain (PI). At the same time through 9:1, dividing the data set into a training verification set and a test set according to the proportion of 9: the 1 scale divides the training verification set into a training set and a verification set.
S2) constructing an infrared image detection model of the power transformation equipment;
s3) training the infrared image detection model of the power transformation equipment by adopting preset training parameters, and finally obtaining the pre-trained infrared image detection model of the power transformation equipment.
In addition, the training parameters preset in step S3) in the present embodiment are the model training parameters of the standard SGD optimizer with an initial learning rate of 0.01 and a momentum of 0.937, and the weight attenuation is always set to 5×10 -4 A total of 400 batches were trained.
The preset loss function in this example may select various consistent loss functions as required, where the preset loss functions are: classification loss L cls Confidence loss L obj And a loss of positioning L CIoU The total loss of the detection model in this example is defined as follows:
LOSS=λ 1 L cls2 L obj3 L CIoU
wherein L is cls A binary cross entropy function (BCE Loss) is employed to calculate the classification Loss between positive samples in the prediction box and the real box. Lo (Lo) bj Confidence Loss of the inclusion objects in all prediction frames is still calculated using BCE Loss. L (L) CIoU The CIoU Loss (Complete Intersection over Union) is used to calculate the positioning error between the positive samples in the prediction box and the real box.
The infrared image detection model network structure of the power transformation equipment is shown in fig. 2, and comprises:
the trunk feature extraction network is used for extracting features of different levels from the input infrared images of the power transformation equipment;
the multi-scale feature aggregation network MFEN is used for carrying out multi-scale multi-level feature aggregation on the feature map of the trunk feature extraction network, and the obtained feature map is sent to the detection head network;
and the detection head network is used for detecting the feature map after the multi-scale feature aggregation to obtain the detection result of the power transformation equipment.
The trunk feature extraction network is used for realizing feature extraction of multiple layers. As an optional implementation manner, as shown in fig. 2, in this embodiment, the trunk feature extraction network is GSCSPLnet, GSCSPLnet with a ten-layer structure, the first layer is a Focus module, the second, fourth, sixth and eighth layers are gsconv_bn_silu modules, the third, fifth, seventh and tenth layers are GSCSP-L modules, and the ninth layer is a multi-level receptive field feature enhancement module MRFFEM. The feature maps obtained by the input image passing through the third, fifth, sixth, seventh, eighth and tenth layers of GSCSPLnet are denoted as C2, C3, C4', C4, C5' and C5, respectively. Wherein, C4 'and C5' are used as the input of the multi-level receptive field feature enhancement module MRFFEM, and the output of the MRFFEM is subjected to the tenth-layer GSCSP-L to obtain C5. C2, C3, C4 and C5 are then fed into the multi-scale feature aggregation network MFEN for further feature processing.
As shown in fig. 3, the GSCSP-L module is composed of a GSCSP and a long-distance feature capture attention mechanism LDFC, which are sequentially cascaded. The GSCSPLnet comprises four GSCSP-L modules which are respectively positioned at a third layer, a fifth layer, a seventh layer and a tenth layer, each GSCSP-L breaks down an input characteristic diagram into a left part and a right part, the left part is used as a main part, the left part is sequentially cascaded with GSConv_BN_SiLU and X residual modules GSResblock, the values in the four GSCSP-L modules are respectively {3,9,9,3}, the right part is directly spliced with the main part after being processed by one GSConv_BN_SiLU, and the output of the GSCSP-L is obtained after the left part is cascaded with the LDFC attention. Wherein, the residual edge of GSResblock adds depth separable convolution DWSConv and the main part also uses light GSConv_BN_SiLU.
The long-distance feature capture attention mechanism LDFC is shown in fig. 4. The long-distance feature capture attention mechanism LDFC in this embodiment includes the following steps: first, an original input feature is inputH represents the height of the feature map, W represents the width of the feature map, and C represents the number of channels of the feature map. The feature map Z is sequentially subjected to downsampling average pooling layer F AvgPool Convolution F of 1×1 Conv Operating and linear activation function s, obtaining a feature of size C'. Times.H/2 XW/2Figure->And->
Then the characteristic diagramAdaptive pooling layer input to vertical direction +.>Central feature polymeric layerAnd convolution F of 1×1 Conv Obtaining a long-distance space feature map with a dimension of CxH/2 x 1 and capturing the vertical direction +.>The trunk portion of the central feature aggregation layer is convolved with the space-separable convolution F in turn SSConv And batch normalization layer F BN Cascade formation to obtain feature f 1 Then f 1 By activating function F ReLU Obtaining the characteristic f 2 Finally f 1 And f 2 Obtaining the final center special layer output Γ by multiplying the element by element CFA
Γ CFA =F BN (F SSConv (x))⊙F ReLU (F BN (F SSconv (x)))
Map the characteristic mapAdaptive pooling layer input to horizontal direction +.>Center feature polymeric layer->And convolution F of 1×1 Conv Obtaining a long-distance space feature map of size Cx1 xW/2 capturing the horizontal direction>
Finally through Sigmoid function sigma and up-sampling operation F Upsample Will beAnd->Conversion to global space-learnable weights alpha H (h) And alpha W (w) multiplying the input feature map Z to obtain the final attention feature map Y C (h,w)。
The multi-level receptive field feature enhancement module MRFFEM in GSCSPLnet, as shown in fig. 5, includes an auxiliary layer positioning feature enhancement block ALFEB and a deep semantic pyramid pooling block DSPPB in this embodiment. Three data processing branches are arranged in the ALFEB, C4 'is used as input of the three data processing branches of the ALFEB, the size of C4' is (40, 40, 256), and the first data processing branch of the ALFEB is sequentially cascaded 1×3 space-separable convolution, 3×1 space-separable convolution, expansion convolution with the convolution kernel size of 3×03 expansion rate of 3 and SimAM attention. The second branch is a space-separable convolution of 1×13, a space-separable convolution of 3×21, a space-separable convolution of 1×33, a space-separable convolution of 3×41, an expansion convolution with a convolution kernel size of 3×53 and an expansion rate of 5, and a Simam attention, which are sequentially cascaded. The third branch is a1×3 space-separable convolution, a 3×1 space-separable convolution, an expansion convolution with a convolution kernel of 3×3 expansion rate of 7, and a Simam attention, which are sequentially cascaded. Finally, the outputs of the three branches are aggregated by splicing operation, and the output f is obtained after convolution of 1 multiplied by 1 1 The size is (20, 20, 512). Three data processing branches are arranged in the DSPPB, C5 'is used as input of the three data processing branches of the DSPPB, the size of C5' is (20, 20, 512), and the first data processing branch of the DSPPB is formed by sequentially cascading 1X 1 GSConv, an adaptive pooling layer, a convolution kernel, an expansion convolution with 3X 3 expansion rate of 1 and SimAM attention. The second branch is a 3×3 GSConv, an adaptive pooling layer, an expansion convolution with a 3×3 expansion rate of 3 and a SimAM attention, which are sequentially cascaded. And the third branch is formed by sequentially cascading two GSConv with 3 multiplied by 3, an adaptive pooling layer, an expansion convolution with 3 multiplied by 3 and an expansion rate of 5 and a SimAM attention. Finally, three branches are connectedThe outputs of (1) are aggregated by splicing operation, and the output f is obtained after convolution of 1 multiplied by 1 2 The size is (20, 20, 512). Will f 1 And f 2 Spliced, and subjected to 1 x 1 convolution to obtain a final output (20, 20, 512).
As shown in fig. 2, the multi-scale feature aggregation network MFEN in this embodiment is composed of three stacked fusion and reassembly feature aggregation blocks CARE and one class fusion and reassembly feature aggregation block SCARE. The input of the first CARE is C5, C4 and C3, the C3 is subjected to downsampling to obtain a characteristic diagram N3 with the same size as the C4, the C5 is subjected to upsampling to obtain a characteristic diagram L5 with the same size as the C4, the N3, the C4 and the L5 are spliced, and the output P4_1 is obtained through GSCSP-L; the second CARE is input into P4_1, C3 and C2, the C2 is subjected to downsampling to obtain a feature diagram N2 with the same size as the C3, the P4_1 is subjected to upsampling to obtain a feature diagram L4_1 with the same size as the C3, the N2, the C3 and the L4_1 are spliced, and the output P3 is obtained through GSCSP-L; the third CARE is input into C5, P4_1 and P3, P3 is subjected to downsampling to obtain a feature diagram N3_1 with the same scale as that of P4_1, C5 is subjected to upsampling to obtain a feature diagram L5_1 with the same scale as that of P4_1, N3_1, P4_1 and L5_1 are spliced, and then GSCSP-L is carried out to obtain an output P4; the SCARE is input into C5 and P4, the C5 obtains a feature diagram S5 with the same size as the C5 through a feature auxiliary module AFSB, the P4 obtains a feature diagram N4 with the same size as the C5 through downsampling, the S5, the N4 and the C5 are spliced, and the output P5 is obtained through GSCSP-L; and finally, P3, P4 and P5 are respectively output to a first detection head, a second detection head and a third detection head of the detection head network.
The multi-scale feature aggregation network MFEN structure introduces a feature auxiliary module AFSB, as shown in fig. 2, where in this embodiment, the feature auxiliary module AFSB is composed of a main branch and a residual branch; the main branch comprises two branches, wherein the first branch is a convolution and Softmax function of 1X 1 which are sequentially cascaded, the second branch is a self-adaptive pooling layer, the outputs of the two branches are multiplied, and then the outputs of the main branch are obtained through the convolution of 1X 1, layerNorm and an activation function ReLU; and finally, splicing the output of the main branch with the residual branch to obtain the output of the AFSB.
Finally, three outputs P3, P4 and P5 of the multi-scale feature aggregation network are transmitted into the detection head module network to carry out final prediction. As shown in fig. 1, the detection Head module network in this embodiment includes first detection heads yolo_head1, yolo_head2 and yolo_head3, where the first detection Head yolo_head1 detects a feature map with a size of 80×80 to obtain a detection result of the power transformation device; the second detection Head yolo_head2 detects the characteristic diagram with the size of 40 multiplied by 40 to obtain a detection result of the power transformation equipment; and the third detection Head yolo_head3 detects the feature map with the size of 20 multiplied by 20 to obtain the detection result of the power transformation equipment.
In order to verify the high-precision detection effect brought by the method of the example, the detection precision is respectively compared with a two-stage model (EfficientDet-D1 and Faster-rcnn), a one-stage detection model (Yolov 8, yolov7, yolox-s, yolov5-s, convNeXt and CenterNet), an Anchor detection model (namely FCOS) and a transducer detection model (DETR and Swin), and the evaluation index uses the detection precision mAP 50 The experimental results obtained on Parameters, calculated amounts of FLPs and detection speed FPS are shown in Table 1.
Table 1 the present invention compares experimental results with other algorithms on the present dataset
In Table 1, mAP 50 (%) represents the average accuracy of the seven classes of power transformation devices at a IoU threshold of 0.5. As can be seen from Table 1, mAP of the method of this example 50 99.2% performs optimally in all methods, 2.08% higher than the suboptimal performing YOLOX-s. The method only increases a small amount of parameters and calculated amount, and simultaneously brings high-precision detection effect, and the detection speed of a single picture can reach 105.3 frames per second, thereby meeting the requirement of real-time detection.
To verify the effect of the long-distance feature capture attention mechanism LDFC in the present example method, it is separately connected with the attention module of the main stream: simAM, SE, ECA and CBAM, the above-described attention was added to Yolov5-s for testing, and the experimental results are shown in Table 2.
TABLE 2 results of the LDFC attention vs. other attention test proposed by the present invention
From table 2, it can be seen that, for the infrared image of the power transformation device, the introduction of the long-distance feature capture attention mechanism LDFC can guide the model to focus on the important area to improve the detection performance of the model, compared to other attention mechanisms. On the other hand, for the target appearing at the image edge, the LDFC can establish the association relation between the image boundary area and the important area, and the long-distance space captures the missing information of the edge, thereby being beneficial to improving the detection performance.
To verify the effect of the multi-level receptive field feature enhancement module MRFFEM in the present example method, MRFFEM was compared to SPPF, simSPPF, ASPP, RFB, SPPCSPC and SPPFCSPC in the mainstream spatial pyramid pooling module SPP. Wherein SPPF is a serial structure of SPPs in Yolov5, simSPPF is a SPP structure in Yolov6, ASPP is a multi-branch pooling formed by using convolution kernels of different sizes with the addition of expansion convolution on the basis of SPP, SPPCSPC is a SPP structure on Yolov7, and SPPFCSPC is a serial structure of SPPCSPC. The above structure was added to YOLOv5 and tested, and the test results are shown in table 3.
TABLE 3 comparison of experimental results of MRFFEM and other spatial pyramid pooling modules proposed by the present invention
As can be seen from table 3, for the infrared image of the power transformation device, compared with other spatial pyramid pooling SPP modules, the multi-level receptive field feature enhancement module MRFFEM is introduced to aggregate and extract different fine granularity features of the target by using convolution layers of different sizes, so that the shape, texture and structural features are enriched, and the detection precision of the model on the power transformation device is improved.
To verify the effect of the multi-scale feature aggregation network MFEN in the present example method, CARE was split at YOLOv5-s into upsampling only OLUP (OnlyUpsample) and downsampling only OLDW (OnlyDownsample), and feature assistance module AFSB was added separately for testing, with the experimental results shown in table 4.
TABLE 4 comparison of the MFEN sub-module ablation experiments presented in the present invention
As can be seen from table 4, for the infrared image of the power transformation device, compared with only up-sampling OLUP and only down-sampling OLDW, the detection accuracy of the model cannot be improved, and the simultaneous up-sampling operation and down-sampling operation of the fire introduced in the multi-scale feature aggregation network MFEN can improve the detection performance of the network. After the feature auxiliary module AFSB is introduced, semantic information useful for the model is supplemented, so that the optimization difficulty of the model is reduced, and the detection effect is improved. In summary, the network model provided by the detection method of the unmanned aerial vehicle inspection substation equipment in the complex scene has the characteristics of real-time performance and high detection precision, can be deployed on unmanned aerial vehicle inspection engineering application, can realize automatic identification and positioning of infrared images of the substation equipment, and can be applied to daily inspection of the running state of the unmanned aerial vehicle automatic inspection substation equipment in a transformer substation and an input electric line.
In addition, the embodiment also provides a method for inspecting the substation equipment by the unmanned aerial vehicle under the complex scene, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the method for detecting the infrared image of the substation equipment based on the deep learning.
In addition, the present example also provides a computer-readable storage medium having stored therein a computer program for execution by a computer device to perform the steps of the method for inspecting a substation device based on a drone in a complex scenario.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims (8)

1. The utility model provides an unmanned aerial vehicle inspection substation equipment method under complicated scene, its characterized in that includes unmanned aerial vehicle collection substation equipment infrared image that the transformer substation detected, inputs substation equipment infrared image to the substation equipment infrared image detection model that trains in advance, obtains substation equipment's testing result, the step of constructing substation equipment infrared image detection model includes:
constructing a trunk feature extraction network GSCSPLnet, wherein the trunk feature extraction network GSCSPLnet adopts a ten-layer structure, the first layer is a Focus module, the second, fourth, sixth and eighth layers are GSConv_BN_SiLU modules, the third, fifth, seventh and tenth layers are GSCSP-L modules, and the ninth layer is a multi-level receptive field feature enhancement module MRFFEM; the GSCSP-L module is formed by sequentially cascading a GSCSP and a long-distance characteristic capture attention mechanism LDFC, wherein the output of the GSCSP is connected to the input of the LDFC, and the output of the LDFC is the output of the GSCSP-L module; the GSConv_BN_SiLU module is formed by sequentially cascading a lightweight convolution GSConv, batchNorm2d and an activation function SiLU; the GSCSP adopts a CSPNet basic structure, each GSCSP consists of X GSResblock and a plurality of GSConv, the GSResblock consists of a main branch and a residual branch, the main branch consists of two GSConv_BN_SiLU modules which are connected in series, and the residual branch is a depth separable convolution DWSConv, batchNorm d and an activation function SiLU which are connected in series in sequence; the feature graphs extracted by the third layer, the fifth layer, the sixth layer, the seventh layer, the eighth layer and the tenth layer of the trunk feature extraction network GSCSPLnet are respectively marked as C2, C3, C4', C4, C5' and C5; the input of the multi-level receptive field feature enhancement module MRFFEM is feature graphs C4 'and C5' extracted by a trunk feature extraction network, and the output of the MRFFEM is subjected to GSCSP-L to obtain C5;
the multi-scale feature aggregation network MFEN is used for carrying out multi-scale multi-level feature aggregation on the feature graphs C2, C3, C4 and C5 extracted by the trunk feature extraction network to obtain feature graphs P3, P4 and P5;
and the detection head network is used for detecting the characteristics P3, P4 and P5 after the multi-scale characteristic aggregation to obtain the detection result of the power transformation equipment.
2. The method for inspecting power transformation equipment of an unmanned aerial vehicle in a complex scene according to claim 1, wherein the long-distance characteristic capturing attention mechanism LDFC comprises the following steps: firstly, an average pooling layer, a convolution operation of 1 multiplied by 1 and a linear activation function are sequentially adopted for an input characteristic diagram Z to obtain the characteristic diagramAnd->Then, the feature map->The convolution input to the vertical adaptive pooling layer, the central feature aggregation layer and 1×1 captures the long-distance spatial features in the vertical directionFeature map +.>The convolution of the adaptive pooling layer, the central feature aggregation layer and 1×1 input to the horizontal direction captures the long distance spatial feature in the horizontal direction>Finally, the +.>And->Conversion to global space-learnable weights alpha H (h) And alpha W (w)。
3. The method for inspecting power transformation equipment of an unmanned aerial vehicle in a complex scene according to claim 1, wherein the multi-level receptive field feature enhancement module MRFFEM comprises an auxiliary layer positioning feature enhancement block ALFEB and a deep semantic pyramid pooling block DSPPB; the ALFEB is provided with an input, three data processing branches and an output, wherein the input of the ALFEB is a characteristic graph C4', the data processing branch I of the ALFEB is a1×3 space-separable convolution, a 3×1 space-separable convolution, a space-separable convolution with a convolution kernel size of 3×03 expansion rate of 3 and a SimaM attention, the branch II is a1×13 space-separable convolution, a 3×21 space-separable convolution, a1×33 space-separable convolution, a 3×41 space-separable convolution, a convolution kernel size of 3×53 expansion convolution with an expansion rate of 5 and a SimaM attention, and the branch III is a1×3 space-separable convolution, a 3×1 space-separable convolution, a convolution kernel size of 3×3 expansion volume and a SimaM attention, wherein the space-separable convolution kernel size of 3×3 expansion rate of 7 are sequentially cascaded; finally, the outputs of the three data processing branches are aggregated by adopting splicing operation, and the output f is obtained after 1 multiplied by 1 convolution 1 The method comprises the steps of carrying out a first treatment on the surface of the The DSPPB comprises an input, three data processing branches and an output, wherein the input of the DSPPB is a characteristic diagram C5', the first data processing branch of the DSPPB is a 1X 1 GSConv, a self-adaptive pooling layer, a convolution kernel which is a 3X 3 expansion rate 1 expansion volume and a SimAM attention which are sequentially cascaded, the second data processing branch is a 3X 3 GSConv, a self-adaptive pooling layer, a convolution kernel which is a 3X 3 expansion rate 3 expansion volume and a SimAM attention which are sequentially cascaded, and the third data processing branch is a two 3X 3 GSConv, a self-adaptive pooling layer and a convolution kernel which are sequentially cascaded, wherein the expansion volume and the SimAM attention which are 3X 3 expansion rates 5; finally, the output of the three data processing branches is adoptedThe splicing operation is aggregated, and the output f is obtained through 1 multiplied by 1 convolution 2 The method comprises the steps of carrying out a first treatment on the surface of the Will f 1 And f 2 And (3) performing splicing operation, and performing 1×1 convolution to obtain a final output.
4. The method for inspecting power transformation equipment of an unmanned aerial vehicle in a complex scene according to claim 1, wherein the multi-scale feature aggregation network MFEN is composed of three stacked fusion and recombination feature aggregation blocks CARE and a class fusion and recombination feature aggregation block CARE; the input of the first CARE is feature graphs C5, C4 and C3, the feature graph N3 with the same scale as the feature graph C4 is obtained through downsampling of the C3, the feature graph L5 with the same scale as the feature graph C4 is obtained through upsampling of the C5, the N3, the C4 and the L5 are spliced, and then the output P4_1 is obtained through GSCSP-L; the second CARE is input into P4_1, C3 and C2, the C2 is subjected to downsampling to obtain a feature diagram N2 with the same size as the C3, the P4_1 is subjected to upsampling to obtain a feature diagram L4_1 with the same size as the C3, the N2, the C3 and the L4_1 are spliced, and the output P3 is obtained through GSCSP-L; the third CARE is input into C5, P4_1 and P3, P3 is subjected to downsampling to obtain a feature diagram N3_1 with the same scale as that of P4_1, C5 is subjected to upsampling to obtain a feature diagram L5_1 with the same scale as that of P4_1, N3_1, P4_1 and L5_1 are spliced, and then GSCSP-L is carried out to obtain an output P4; the SCARE is input into C5 and P4, the C5 obtains a feature diagram S5 with the same size as the C5 through a feature auxiliary module AFSB, the P4 obtains a feature diagram N4 with the same size as the C5 through downsampling, the S5, the N4 and the C5 are spliced, and the output P5 is obtained through GSCSP-L; and finally, P3, P4 and P5 are respectively output to a first detection head, a second detection head and a third detection head of the detection head network.
5. The unmanned aerial vehicle inspection substation equipment method under a complex scene according to claim 1, wherein the detection Head module network comprises a first detection Head yolo_head1, a second detection Head yolo_head2 and a third detection Head yolo_head3; the first detection Head yolo_head1 detects the characteristic diagram with the size of 80 multiplied by 80 to obtain a detection result of the power transformation equipment; the second detection Head yolo_head2 detects the characteristic diagram with the size of 40 multiplied by 40 to obtain a detection result of the power transformation equipment; and the third detection Head yolo_head3 detects the feature map with the size of 20 multiplied by 20 to obtain the detection result of the power transformation equipment.
6. The method for inspecting power transformation equipment of unmanned aerial vehicle under complex scene according to claim 4, wherein the characteristic auxiliary module AFSB consists of a main branch and a residual branch; the main branch comprises two branches, wherein the first branch is a convolution and Softmax function of 1X 1 which are sequentially cascaded, the second branch is a self-adaptive pooling layer, the outputs of the two branches are multiplied, and then the outputs of the main branch are obtained through the convolution of 1X 1, layerNorm and an activation function ReLU; and finally, splicing the output of the main branch with the residual branch to obtain the output of the AFSB.
7. A method of inspecting a power transformation device for a drone in a complex scenario comprising a microprocessor and a memory, which are interconnected, characterized in that the microprocessor is programmed or configured to perform the steps of the method of inspecting a power transformation device for a drone in a complex scenario according to any one of claims 1 to 6.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program for execution by a computer device to implement the steps of the method for inspecting a power transformation device for a drone in a complex scenario according to any one of claims 1 to 6.
CN202311381395.XA 2023-10-23 2023-10-23 Method for unmanned aerial vehicle inspection of power transformation equipment in complex scene Pending CN117392568A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311381395.XA CN117392568A (en) 2023-10-23 2023-10-23 Method for unmanned aerial vehicle inspection of power transformation equipment in complex scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311381395.XA CN117392568A (en) 2023-10-23 2023-10-23 Method for unmanned aerial vehicle inspection of power transformation equipment in complex scene

Publications (1)

Publication Number Publication Date
CN117392568A true CN117392568A (en) 2024-01-12

Family

ID=89469791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311381395.XA Pending CN117392568A (en) 2023-10-23 2023-10-23 Method for unmanned aerial vehicle inspection of power transformation equipment in complex scene

Country Status (1)

Country Link
CN (1) CN117392568A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117975372A (en) * 2024-03-29 2024-05-03 山东浪潮科学研究院有限公司 Construction site safety detection system and method based on YOLOv and transducer encoder

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117975372A (en) * 2024-03-29 2024-05-03 山东浪潮科学研究院有限公司 Construction site safety detection system and method based on YOLOv and transducer encoder

Similar Documents

Publication Publication Date Title
CN109543606B (en) Human face recognition method with attention mechanism
CN111046821B (en) Video behavior recognition method and system and electronic equipment
CN112070729B (en) Anchor-free remote sensing image target detection method and system based on scene enhancement
CN110378222A (en) A kind of vibration damper on power transmission line target detection and defect identification method and device
CN104517103A (en) Traffic sign classification method based on deep neural network
Cepni et al. Vehicle detection using different deep learning algorithms from image sequence
CN105184229A (en) Online learning based real-time pedestrian detection method in dynamic scene
CN117392568A (en) Method for unmanned aerial vehicle inspection of power transformation equipment in complex scene
CN113569672A (en) Lightweight target detection and fault identification method, device and system
CN114463759A (en) Lightweight character detection method and device based on anchor-frame-free algorithm
CN109241814A (en) Pedestrian detection method based on YOLO neural network
Wanguo et al. Typical defect detection technology of transmission line based on deep learning
CN113128476A (en) Low-power consumption real-time helmet detection method based on computer vision target detection
CN116630668A (en) Method for identifying wearing abnormality of safety helmet in quick lightweight manner
CN112837281B (en) Pin defect identification method, device and equipment based on cascade convolution neural network
CN117132910A (en) Vehicle detection method and device for unmanned aerial vehicle and storage medium
CN116820131A (en) Unmanned aerial vehicle tracking method based on target perception ViT
Zhan Electric equipment inspection on high voltage transmission line via Mobile Net-SSD
Wang et al. Research on appearance defect detection of power equipment based on improved faster-rcnn
Li et al. Multi-scale feature extraction and fusion net: Research on UAVs image semantic segmentation technology
CN112364878A (en) Power line classification method based on deep learning under complex background
Li et al. Lightweight Real-time Object Detection System Based on Embedded AI Development Kit
Gan et al. Intelligent fault diagnosis with deep architecture
Wu et al. Detection of Defects in Power Grid Inspection Images Based on Multi-scale Fusion
CN112733632B (en) Robot control method based on face recognition and gesture recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination