CN117392568A

CN117392568A - Method for unmanned aerial vehicle inspection of power transformation equipment in complex scene

Info

Publication number: CN117392568A
Application number: CN202311381395.XA
Authority: CN
Inventors: 朱江; 范崇高; 许海霞; 余洪山; 张�杰; 王昭鸿
Original assignee: Xiangtan University
Current assignee: Xiangtan University
Priority date: 2023-10-23
Filing date: 2023-10-23
Publication date: 2024-01-12

Abstract

The invention discloses a method for inspecting substation equipment by an unmanned aerial vehicle in a complex scene, which mainly solves the problems of lower detection precision and more consumption of computing resources in the prior art. The method comprises the steps of collecting infrared images of detected power transformation equipment, inputting the infrared images of the power transformation equipment into a pre-trained infrared image detection model of the power transformation equipment to obtain detection results of the detected power transformation equipment, wherein the infrared image detection model of the power transformation equipment comprises the following steps: a backbone feature extraction network, a multi-scale feature aggregation extraction network, and a detection head network. According to the invention, under the condition of low calculation resource requirement, the detection precision of the power transformation equipment in the aerial infrared image is improved, and the power transformation equipment can be applied to the daily inspection of the running state of the power transformation equipment for the unmanned aerial vehicle autonomous inspection of the power transmission line.

Description

Method for unmanned aerial vehicle inspection of power transformation equipment in complex scene

Technical Field

The invention relates to an infrared image detection method of power transformation equipment, in particular to a method for inspecting the power transformation equipment by an unmanned aerial vehicle in a complex scene.

Background

As an important infrastructure of the power system, the power transformation equipment plays a role of voltage and current transformation, receiving and distributing electric energy. The transformer equipment mainly comprises a lightning arrester, a transformer, an isolating switch and a pillar porcelain bottle. They work outdoors for a long time, have the characteristics of high voltage and large current, local overheat faults easily occur, and the safety and stability of the operation of the power grid are seriously threatened. Therefore, monitoring the heating state of the power transformation equipment is a basic task of daily inspection. Because the power transformation equipment is erected at a high place, the unmanned aerial vehicle is provided with the infrared thermal imaging instrument to acquire aerial images of the power transformation equipment, and the aerial images are an effective means for identifying abnormal heating of the surface of the power transformation equipment. In recent years, many studies have been conducted on the visual task of automatic inspection of power transformation equipment. Different power transformation devices have different surface temperature distribution characteristics. Therefore, in order to realize automatic detection and diagnosis of the heating state of the power transformation device, the power transformation device is first identified and positioned in the infrared image quickly and accurately.

In recent years, with the development of convolutional neural networks, some general object detection networks, such as fast RCNN, SSD and YOLO, have been proposed successively, which achieve significant effects in multi-scale object detection of visible light images. Because of the difference between the infrared image and the visible light image, these general object detection models are difficult to directly apply to detection of power transformation equipment in the infrared image. The researchers correspondingly improve the universal detection model aiming at different application scenes. Although these methods improve the effect of target detection of infrared aerial images of unmanned aerial vehicles to a certain extent, detection of power transformation equipment in infrared aerial images still faces the following challenges:

(1) The spatial distribution of the target is unbalanced. The power transformation equipment tends to concentrate in the middle region of the image under the unmanned aerial vehicle viewing angle. Resulting in dense objects and mutual occlusion in a partial area.

(2) Varying unmanned viewing angles. Under different unmanned aerial vehicle visual angles, the appearance of the same class of power transformation equipment is different; on the other hand, the different classes of power transformation devices are very similar in appearance under a particular unmanned viewing angle. This problem of large intra-category differences and small inter-category differences makes it difficult for the model to effectively distinguish between objects.

(3) A varying target dimension. Due to the change of the flying height, the scale presented by the same category of targets in the unmanned aerial vehicle image changes sharply. Moreover, large and small objects may exist simultaneously in the same image. The model needs to be able to detect targets of varying dimensions.

(4) Limited on-board resources. The unmanned aerial vehicle has very limited computing platform resources on board the unmanned aerial vehicle due to the limitations of bearing, power consumption and cost. The detection model is deployed on a computing platform with insufficient resources, and parameters, computational complexity and reasoning time of the model need to be reduced on the premise of ensuring detection accuracy.

Under the challenges, the existing method cannot achieve the aim of achieving both low calculation resource consumption and rapid and accurate detection of the power transformation equipment from the aerial infrared image.

Disclosure of Invention

The invention aims to solve the technical problems: aiming at the problems in the prior art, the invention provides the unmanned aerial vehicle inspection substation equipment method in a complex scene, which can solve the problem that the infrared image detection precision of the unmanned aerial vehicle inspection substation equipment is not high under the condition that the existing method is difficult to maintain low calculation resource consumption.

In order to solve the technical problems, the invention adopts the following technical scheme:

the utility model provides an unmanned aerial vehicle inspection substation equipment method based on under complex scene, includes gathering the substation equipment infrared image that is detected, and the substation equipment infrared image input is trained in advance to substation equipment infrared image detection model and is obtained the testing result of substation equipment that is detected, substation equipment infrared image detection model includes:

the trunk feature extraction network is used for extracting features of different levels from the input infrared images of the power transformation equipment; the main feature extraction network GSCSPLnet adopts a ten-layer structure, wherein the first layer is a Focus module, the second, fourth, sixth and eighth layers are GSConv_BN_SiLU modules, the third, fifth, seventh and tenth layers are GSCSP-L modules, and the ninth layer is a multi-level receptive field feature enhancement module MRFFEM; the GSCSP-L module is formed by sequentially cascading a GSCSP and a long-distance characteristic capture attention mechanism LDFC, wherein the output of the GSCSP is connected to the input of the LDFC, and the output of the LDFC is the output of the GSCSP-L module; the GSConv_BN_SiLU module is formed by sequentially cascading a lightweight convolution GSConv, batchNorm2d and an activation function SiLU; the GSCSP adopts a CSPNet basic structure, each GSCSP consists of X GSResblock and a plurality of GSConv, the GSResblock consists of a main branch and a residual branch, the main branch consists of two GSConv_BN_SiLU modules which are connected in series, and the residual branch is a depth separable convolution DWSConv, batchNorm d and an activation function SiLU which are connected in series in sequence; the feature graphs extracted by the third layer, the fifth layer, the sixth layer, the seventh layer, the eighth layer and the tenth layer of the trunk feature extraction network GSCSPLnet are respectively marked as C2, C3, C4', C4, C5' and C5; the input of the multi-level receptive field feature enhancement module MRFFEM is feature graphs C4 'and C5' extracted by a trunk feature extraction network, and the output of the MRFFEM is subjected to GSCSP-L to obtain C5;

the multi-scale feature aggregation network MFEN is used for carrying out multi-scale multi-level feature aggregation on the feature graphs C2, C3, C4 and C5 extracted by the trunk feature extraction network to obtain feature graphs P3, P4 and P5;

and the detection head network is used for detecting the characteristics P3, P4 and P5 after the multi-scale characteristic aggregation to obtain the detection result of the power transformation equipment.

Optionally, the long-distance feature capture attention mechanism LDFC comprises the following steps: firstly, an average pooling layer, a convolution operation of 1 multiplied by 1 and a linear activation function are sequentially adopted for an input characteristic diagram Z to obtain the characteristic diagramAnd->Then, the feature map->Adaptive pooling of input into vertical directionLayer, central feature aggregation layer and convolution of 1 x 1, capturing long distance spatial features in the vertical direction +.>Feature map +.>The convolution of the adaptive pooling layer, the central feature aggregation layer and 1×1 input to the horizontal direction captures the long distance spatial feature in the horizontal direction>Finally, through Sigmoid function and up-sampling operationAnd->Conversion to global space-learnable weights alpha ^H (h) And alpha ^H (w)。

Optionally, the multi-level receptive field feature enhancement module MRFFEM comprises an auxiliary layer positioning feature enhancement block ALFEB and a deep semantic pyramid pooling block DSPPB; the ALFEB is provided with an input, three data processing branches and an output, wherein the input of the ALFEB is a characteristic graph C4', the data processing branch I of the ALFEB is a1×3 space-separable convolution, a 3×1 space-separable convolution, a space-separable convolution with a convolution kernel size of 3×03 expansion rate of 3 and a SimaM attention, the branch II is a1×13 space-separable convolution, a 3×21 space-separable convolution, a1×33 space-separable convolution, a 3×41 space-separable convolution, a convolution kernel size of 3×3 expansion rate of 5 and a SimaM attention, and the branch III is a1×3 space-separable convolution, a 3×1 space-separable convolution, a convolution kernel size of 3×3 expansion volume and a SimaM attention, wherein the space-separable convolution kernel size of 7 is sequentially cascaded; finally, the output of the three data processing branches is adoptedAggregation is performed by splicing operation, and output f is obtained after 1 multiplied by 1 convolution ₁ The method comprises the steps of carrying out a first treatment on the surface of the The DSPPB comprises an input, three data processing branches and an output, wherein the input of the DSPPB is a characteristic diagram C5', the first data processing branch of the DSPPB is a 1X 1 GSConv, a self-adaptive pooling layer, a convolution kernel which is a 3X 3 expansion rate 1 expansion volume and a SimAM attention which are sequentially cascaded, the second data processing branch is a 3X 3 GSConv, a self-adaptive pooling layer, a convolution kernel which is a 3X 3 expansion rate 3 expansion volume and a SimAM attention which are sequentially cascaded, and the third data processing branch is a two 3X 3 GSConv, a self-adaptive pooling layer and a convolution kernel which are sequentially cascaded, wherein the expansion volume and the SimAM attention which are 3X 3 expansion rates 5; finally, the outputs of the three data processing branches are aggregated by adopting splicing operation, and the output f is obtained by 1X 1 convolution ₂ The method comprises the steps of carrying out a first treatment on the surface of the Will f ₁ And f ₂ And (3) performing splicing operation, and performing 1×1 convolution to obtain a final output.

Optionally, the multi-scale feature aggregation network MFEN is composed of three stacked fusion and reassembly feature aggregation blocks CARE and one class fusion and reassembly feature aggregation block SCARE; the input of the first CARE is feature graphs C5, C4 and C3, the feature graph N3 with the same scale as the feature graph C4 is obtained through downsampling of the C3, the feature graph L5 with the same scale as the feature graph C4 is obtained through upsampling of the C5, the N3, the C4 and the L5 are spliced, and then the output P4_1 is obtained through GSCSP-L; the second CARE is input into P4_1, C3 and C2, the C2 is subjected to downsampling to obtain a feature diagram N2 with the same size as the C3, the P4_1 is subjected to upsampling to obtain a feature diagram L4_1 with the same size as the C3, the N2, the C3 and the L4_1 are spliced, and the output P3 is obtained through GSCSP-L; the third CARE is input into C5, P4_1 and P3, P3 is subjected to downsampling to obtain a feature diagram N3_1 with the same scale as that of P4_1, C5 is subjected to upsampling to obtain a feature diagram L5_1 with the same scale as that of P4_1, N3_1, P4_1 and L5_1 are spliced, and then GSCSP-L is carried out to obtain an output P4; the SCARE is input into C5 and P4, the C5 obtains a feature diagram S5 with the same size as the C5 through a feature auxiliary module AFSB, the P4 obtains a feature diagram N4 with the same size as the C5 through downsampling, the S5, the N4 and the C5 are spliced, and the output P5 is obtained through GSCSP-L; and finally, P3, P4 and P5 are respectively output to a first detection head, a second detection head and a third detection head of the detection head network.

Optionally, the feature auxiliary module AFSB is composed of a main branch and a residual branch; the main branch comprises two branches, wherein the first branch is a convolution and Softmax function of 1X 1 which are sequentially cascaded, the second branch is a self-adaptive pooling layer, the outputs of the two branches are multiplied, and then the outputs of the main branch are obtained through the convolution of 1X 1, layerNorm and an activation function ReLU; and finally, splicing the output of the main branch with the residual branch to obtain the output of the AFSB.

Optionally, the detection Head module network includes a first detection Head yolo_head1, a second detection Head yolo_head2, and a third detection Head yolo_head3; the first detection Head yolo_head1 detects the characteristic diagram with the size of 80 multiplied by 80 to obtain a detection result of the power transformation equipment; the second detection Head yolo_head2 detects the characteristic diagram with the size of 40 multiplied by 40 to obtain a detection result of the power transformation equipment; and the third detection Head yolo_head3 detects the feature map with the size of 20 multiplied by 20 to obtain the detection result of the power transformation equipment.

Optionally, before the step of inputting the infrared image of the power transformation device into the pre-trained infrared image detection model of the power transformation device, the method further comprises the step of training the infrared image detection model of the power transformation device:

s1) acquiring an infrared image sample and attaching a label to construct an infrared image data set of the transformer equipment;

s2) constructing an infrared image detection model of the power transformation equipment;

s3) training the infrared image detection model of the power transformation equipment by adopting preset training parameters, and finally obtaining the pre-trained infrared image detection model of the power transformation equipment.

Optionally, the preset training parameters are standard SGD optimizer training models with initial learning rate of 0.01 and momentum of 0.937, and weight attenuation is always set to 5×10 ^-4 A total of 400 batches were trained.

In addition, the invention also provides a method for inspecting the substation equipment of the unmanned aerial vehicle based on the complex scene, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the method for detecting the infrared image of the substation equipment based on the deep learning.

Furthermore, the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program is used for being executed by computer equipment to implement the steps of the unmanned aerial vehicle inspection substation equipment method under the complex scene.

Compared with the prior art, the invention has the following advantages: the invention improves the detection precision of the power transformation equipment in the aerial infrared image of the unmanned aerial vehicle under the condition of low calculation resource consumption, and can be applied to the daily inspection of the running state of the power transformation equipment for the autonomous inspection of the unmanned aerial vehicle of a transformer substation and an electric line.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a schematic diagram of a network structure of an infrared image detection model of a power transformation device according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the structure of GSCSP-L according to an embodiment of the invention;

FIG. 4 is a schematic diagram of the structure of the attention of an LDFC in an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a multi-level receptive field feature enhancement module MRFFEM in an embodiment of the invention.

Detailed Description

The embodiment provides a method for inspecting power transformation equipment by an unmanned aerial vehicle in a complex scene, the flow is shown in fig. 1, the method comprises the steps of collecting infrared images of the detected power transformation equipment, constructing an infrared image detection model of the power transformation equipment, pre-training the infrared image detection model of the power transformation equipment, inputting the infrared images of the power transformation equipment into the pre-trained infrared image detection model of the power transformation equipment to obtain a detection result of the detected power transformation equipment, and the specific steps are as follows:

s1) acquiring an infrared image sample and attaching a label to construct an infrared image data set of the transformer equipment; in this embodiment, an infrared image dataset of the power transformation device is constructed, and the dataset includes seven types of power transformation devices: arrester 1 (PB 1), arrester 2 (PB 2), current transformer 1 (TA 1), current transformer 2 (TA 2), voltage Transformer (TV), disconnecting Switch (DS) and pillar Porcelain (PI). At the same time through 9:1, dividing the data set into a training verification set and a test set according to the proportion of 9: the 1 scale divides the training verification set into a training set and a verification set.

In addition, the training parameters preset in step S3) in the present embodiment are the model training parameters of the standard SGD optimizer with an initial learning rate of 0.01 and a momentum of 0.937, and the weight attenuation is always set to 5×10 ^-4 A total of 400 batches were trained.

The preset loss function in this example may select various consistent loss functions as required, where the preset loss functions are: classification loss L _cls Confidence loss L _obj And a loss of positioning L _CIoU The total loss of the detection model in this example is defined as follows:

LOSS＝λ ₁ L _cls +λ ₂ L _obj +λ ₃ L _CIoU

wherein L is _cls A binary cross entropy function (BCE Loss) is employed to calculate the classification Loss between positive samples in the prediction box and the real box. Lo (Lo) _bj Confidence Loss of the inclusion objects in all prediction frames is still calculated using BCE Loss. L (L) _CIoU The CIoU Loss (Complete Intersection over Union) is used to calculate the positioning error between the positive samples in the prediction box and the real box.

The infrared image detection model network structure of the power transformation equipment is shown in fig. 2, and comprises:

the trunk feature extraction network is used for extracting features of different levels from the input infrared images of the power transformation equipment;

the multi-scale feature aggregation network MFEN is used for carrying out multi-scale multi-level feature aggregation on the feature map of the trunk feature extraction network, and the obtained feature map is sent to the detection head network;

and the detection head network is used for detecting the feature map after the multi-scale feature aggregation to obtain the detection result of the power transformation equipment.

The trunk feature extraction network is used for realizing feature extraction of multiple layers. As an optional implementation manner, as shown in fig. 2, in this embodiment, the trunk feature extraction network is GSCSPLnet, GSCSPLnet with a ten-layer structure, the first layer is a Focus module, the second, fourth, sixth and eighth layers are gsconv_bn_silu modules, the third, fifth, seventh and tenth layers are GSCSP-L modules, and the ninth layer is a multi-level receptive field feature enhancement module MRFFEM. The feature maps obtained by the input image passing through the third, fifth, sixth, seventh, eighth and tenth layers of GSCSPLnet are denoted as C2, C3, C4', C4, C5' and C5, respectively. Wherein, C4 'and C5' are used as the input of the multi-level receptive field feature enhancement module MRFFEM, and the output of the MRFFEM is subjected to the tenth-layer GSCSP-L to obtain C5. C2, C3, C4 and C5 are then fed into the multi-scale feature aggregation network MFEN for further feature processing.

As shown in fig. 3, the GSCSP-L module is composed of a GSCSP and a long-distance feature capture attention mechanism LDFC, which are sequentially cascaded. The GSCSPLnet comprises four GSCSP-L modules which are respectively positioned at a third layer, a fifth layer, a seventh layer and a tenth layer, each GSCSP-L breaks down an input characteristic diagram into a left part and a right part, the left part is used as a main part, the left part is sequentially cascaded with GSConv_BN_SiLU and X residual modules GSResblock, the values in the four GSCSP-L modules are respectively {3,9,9,3}, the right part is directly spliced with the main part after being processed by one GSConv_BN_SiLU, and the output of the GSCSP-L is obtained after the left part is cascaded with the LDFC attention. Wherein, the residual edge of GSResblock adds depth separable convolution DWSConv and the main part also uses light GSConv_BN_SiLU.

The long-distance feature capture attention mechanism LDFC is shown in fig. 4. The long-distance feature capture attention mechanism LDFC in this embodiment includes the following steps: first, an original input feature is inputH represents the height of the feature map, W represents the width of the feature map, and C represents the number of channels of the feature map. The feature map Z is sequentially subjected to downsampling average pooling layer F _AvgPool Convolution F of 1×1 _Conv Operating and linear activation function s, obtaining a feature of size C'. Times.H/2 XW/2Figure->And->

Then the characteristic diagramAdaptive pooling layer input to vertical direction +.>Central feature polymeric layerAnd convolution F of 1×1 _Conv Obtaining a long-distance space feature map with a dimension of CxH/2 x 1 and capturing the vertical direction +.>The trunk portion of the central feature aggregation layer is convolved with the space-separable convolution F in turn _SSConv And batch normalization layer F _BN Cascade formation to obtain feature f ₁ Then f ₁ By activating function F _ReLU Obtaining the characteristic f ₂ Finally f ₁ And f ₂ Obtaining the final center special layer output Γ by multiplying the element by element _CFA 。

Γ _CFA ＝F _BN (F _SSConv (x))⊙F _ReLU (F _BN (F _SSconv (x)))

Map the characteristic mapAdaptive pooling layer input to horizontal direction +.>Center feature polymeric layer->And convolution F of 1×1 _Conv Obtaining a long-distance space feature map of size Cx1 xW/2 capturing the horizontal direction>

Finally through Sigmoid function sigma and up-sampling operation F _Upsample Will beAnd->Conversion to global space-learnable weights alpha ^H (h) And alpha ^W (w) multiplying the input feature map Z to obtain the final attention feature map Y _C (h，w)。

The multi-level receptive field feature enhancement module MRFFEM in GSCSPLnet, as shown in fig. 5, includes an auxiliary layer positioning feature enhancement block ALFEB and a deep semantic pyramid pooling block DSPPB in this embodiment. Three data processing branches are arranged in the ALFEB, C4 'is used as input of the three data processing branches of the ALFEB, the size of C4' is (40, 40, 256), and the first data processing branch of the ALFEB is sequentially cascaded 1×3 space-separable convolution, 3×1 space-separable convolution, expansion convolution with the convolution kernel size of 3×03 expansion rate of 3 and SimAM attention. The second branch is a space-separable convolution of 1×13, a space-separable convolution of 3×21, a space-separable convolution of 1×33, a space-separable convolution of 3×41, an expansion convolution with a convolution kernel size of 3×53 and an expansion rate of 5, and a Simam attention, which are sequentially cascaded. The third branch is a1×3 space-separable convolution, a 3×1 space-separable convolution, an expansion convolution with a convolution kernel of 3×3 expansion rate of 7, and a Simam attention, which are sequentially cascaded. Finally, the outputs of the three branches are aggregated by splicing operation, and the output f is obtained after convolution of 1 multiplied by 1 ₁ The size is (20, 20, 512). Three data processing branches are arranged in the DSPPB, C5 'is used as input of the three data processing branches of the DSPPB, the size of C5' is (20, 20, 512), and the first data processing branch of the DSPPB is formed by sequentially cascading 1X 1 GSConv, an adaptive pooling layer, a convolution kernel, an expansion convolution with 3X 3 expansion rate of 1 and SimAM attention. The second branch is a 3×3 GSConv, an adaptive pooling layer, an expansion convolution with a 3×3 expansion rate of 3 and a SimAM attention, which are sequentially cascaded. And the third branch is formed by sequentially cascading two GSConv with 3 multiplied by 3, an adaptive pooling layer, an expansion convolution with 3 multiplied by 3 and an expansion rate of 5 and a SimAM attention. Finally, three branches are connectedThe outputs of (1) are aggregated by splicing operation, and the output f is obtained after convolution of 1 multiplied by 1 ₂ The size is (20, 20, 512). Will f ₁ And f ₂ Spliced, and subjected to 1 x 1 convolution to obtain a final output (20, 20, 512).

As shown in fig. 2, the multi-scale feature aggregation network MFEN in this embodiment is composed of three stacked fusion and reassembly feature aggregation blocks CARE and one class fusion and reassembly feature aggregation block SCARE. The input of the first CARE is C5, C4 and C3, the C3 is subjected to downsampling to obtain a characteristic diagram N3 with the same size as the C4, the C5 is subjected to upsampling to obtain a characteristic diagram L5 with the same size as the C4, the N3, the C4 and the L5 are spliced, and the output P4_1 is obtained through GSCSP-L; the second CARE is input into P4_1, C3 and C2, the C2 is subjected to downsampling to obtain a feature diagram N2 with the same size as the C3, the P4_1 is subjected to upsampling to obtain a feature diagram L4_1 with the same size as the C3, the N2, the C3 and the L4_1 are spliced, and the output P3 is obtained through GSCSP-L; the third CARE is input into C5, P4_1 and P3, P3 is subjected to downsampling to obtain a feature diagram N3_1 with the same scale as that of P4_1, C5 is subjected to upsampling to obtain a feature diagram L5_1 with the same scale as that of P4_1, N3_1, P4_1 and L5_1 are spliced, and then GSCSP-L is carried out to obtain an output P4; the SCARE is input into C5 and P4, the C5 obtains a feature diagram S5 with the same size as the C5 through a feature auxiliary module AFSB, the P4 obtains a feature diagram N4 with the same size as the C5 through downsampling, the S5, the N4 and the C5 are spliced, and the output P5 is obtained through GSCSP-L; and finally, P3, P4 and P5 are respectively output to a first detection head, a second detection head and a third detection head of the detection head network.

The multi-scale feature aggregation network MFEN structure introduces a feature auxiliary module AFSB, as shown in fig. 2, where in this embodiment, the feature auxiliary module AFSB is composed of a main branch and a residual branch; the main branch comprises two branches, wherein the first branch is a convolution and Softmax function of 1X 1 which are sequentially cascaded, the second branch is a self-adaptive pooling layer, the outputs of the two branches are multiplied, and then the outputs of the main branch are obtained through the convolution of 1X 1, layerNorm and an activation function ReLU; and finally, splicing the output of the main branch with the residual branch to obtain the output of the AFSB.

Finally, three outputs P3, P4 and P5 of the multi-scale feature aggregation network are transmitted into the detection head module network to carry out final prediction. As shown in fig. 1, the detection Head module network in this embodiment includes first detection heads yolo_head1, yolo_head2 and yolo_head3, where the first detection Head yolo_head1 detects a feature map with a size of 80×80 to obtain a detection result of the power transformation device; the second detection Head yolo_head2 detects the characteristic diagram with the size of 40 multiplied by 40 to obtain a detection result of the power transformation equipment; and the third detection Head yolo_head3 detects the feature map with the size of 20 multiplied by 20 to obtain the detection result of the power transformation equipment.

In order to verify the high-precision detection effect brought by the method of the example, the detection precision is respectively compared with a two-stage model (EfficientDet-D1 and Faster-rcnn), a one-stage detection model (Yolov 8, yolov7, yolox-s, yolov5-s, convNeXt and CenterNet), an Anchor detection model (namely FCOS) and a transducer detection model (DETR and Swin), and the evaluation index uses the detection precision mAP ₅₀ The experimental results obtained on Parameters, calculated amounts of FLPs and detection speed FPS are shown in Table 1.

Table 1 the present invention compares experimental results with other algorithms on the present dataset

In Table 1, mAP ₅₀ (%) represents the average accuracy of the seven classes of power transformation devices at a IoU threshold of 0.5. As can be seen from Table 1, mAP of the method of this example ₅₀ 99.2% performs optimally in all methods, 2.08% higher than the suboptimal performing YOLOX-s. The method only increases a small amount of parameters and calculated amount, and simultaneously brings high-precision detection effect, and the detection speed of a single picture can reach 105.3 frames per second, thereby meeting the requirement of real-time detection.

To verify the effect of the long-distance feature capture attention mechanism LDFC in the present example method, it is separately connected with the attention module of the main stream: simAM, SE, ECA and CBAM, the above-described attention was added to Yolov5-s for testing, and the experimental results are shown in Table 2.

TABLE 2 results of the LDFC attention vs. other attention test proposed by the present invention

From table 2, it can be seen that, for the infrared image of the power transformation device, the introduction of the long-distance feature capture attention mechanism LDFC can guide the model to focus on the important area to improve the detection performance of the model, compared to other attention mechanisms. On the other hand, for the target appearing at the image edge, the LDFC can establish the association relation between the image boundary area and the important area, and the long-distance space captures the missing information of the edge, thereby being beneficial to improving the detection performance.

To verify the effect of the multi-level receptive field feature enhancement module MRFFEM in the present example method, MRFFEM was compared to SPPF, simSPPF, ASPP, RFB, SPPCSPC and SPPFCSPC in the mainstream spatial pyramid pooling module SPP. Wherein SPPF is a serial structure of SPPs in Yolov5, simSPPF is a SPP structure in Yolov6, ASPP is a multi-branch pooling formed by using convolution kernels of different sizes with the addition of expansion convolution on the basis of SPP, SPPCSPC is a SPP structure on Yolov7, and SPPFCSPC is a serial structure of SPPCSPC. The above structure was added to YOLOv5 and tested, and the test results are shown in table 3.

TABLE 3 comparison of experimental results of MRFFEM and other spatial pyramid pooling modules proposed by the present invention

As can be seen from table 3, for the infrared image of the power transformation device, compared with other spatial pyramid pooling SPP modules, the multi-level receptive field feature enhancement module MRFFEM is introduced to aggregate and extract different fine granularity features of the target by using convolution layers of different sizes, so that the shape, texture and structural features are enriched, and the detection precision of the model on the power transformation device is improved.

To verify the effect of the multi-scale feature aggregation network MFEN in the present example method, CARE was split at YOLOv5-s into upsampling only OLUP (OnlyUpsample) and downsampling only OLDW (OnlyDownsample), and feature assistance module AFSB was added separately for testing, with the experimental results shown in table 4.

TABLE 4 comparison of the MFEN sub-module ablation experiments presented in the present invention

As can be seen from table 4, for the infrared image of the power transformation device, compared with only up-sampling OLUP and only down-sampling OLDW, the detection accuracy of the model cannot be improved, and the simultaneous up-sampling operation and down-sampling operation of the fire introduced in the multi-scale feature aggregation network MFEN can improve the detection performance of the network. After the feature auxiliary module AFSB is introduced, semantic information useful for the model is supplemented, so that the optimization difficulty of the model is reduced, and the detection effect is improved. In summary, the network model provided by the detection method of the unmanned aerial vehicle inspection substation equipment in the complex scene has the characteristics of real-time performance and high detection precision, can be deployed on unmanned aerial vehicle inspection engineering application, can realize automatic identification and positioning of infrared images of the substation equipment, and can be applied to daily inspection of the running state of the unmanned aerial vehicle automatic inspection substation equipment in a transformer substation and an input electric line.

In addition, the embodiment also provides a method for inspecting the substation equipment by the unmanned aerial vehicle under the complex scene, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the method for detecting the infrared image of the substation equipment based on the deep learning.

In addition, the present example also provides a computer-readable storage medium having stored therein a computer program for execution by a computer device to perform the steps of the method for inspecting a substation device based on a drone in a complex scenario.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims

1. The utility model provides an unmanned aerial vehicle inspection substation equipment method under complicated scene, its characterized in that includes unmanned aerial vehicle collection substation equipment infrared image that the transformer substation detected, inputs substation equipment infrared image to the substation equipment infrared image detection model that trains in advance, obtains substation equipment's testing result, the step of constructing substation equipment infrared image detection model includes:

constructing a trunk feature extraction network GSCSPLnet, wherein the trunk feature extraction network GSCSPLnet adopts a ten-layer structure, the first layer is a Focus module, the second, fourth, sixth and eighth layers are GSConv_BN_SiLU modules, the third, fifth, seventh and tenth layers are GSCSP-L modules, and the ninth layer is a multi-level receptive field feature enhancement module MRFFEM; the GSCSP-L module is formed by sequentially cascading a GSCSP and a long-distance characteristic capture attention mechanism LDFC, wherein the output of the GSCSP is connected to the input of the LDFC, and the output of the LDFC is the output of the GSCSP-L module; the GSConv_BN_SiLU module is formed by sequentially cascading a lightweight convolution GSConv, batchNorm2d and an activation function SiLU; the GSCSP adopts a CSPNet basic structure, each GSCSP consists of X GSResblock and a plurality of GSConv, the GSResblock consists of a main branch and a residual branch, the main branch consists of two GSConv_BN_SiLU modules which are connected in series, and the residual branch is a depth separable convolution DWSConv, batchNorm d and an activation function SiLU which are connected in series in sequence; the feature graphs extracted by the third layer, the fifth layer, the sixth layer, the seventh layer, the eighth layer and the tenth layer of the trunk feature extraction network GSCSPLnet are respectively marked as C2, C3, C4', C4, C5' and C5; the input of the multi-level receptive field feature enhancement module MRFFEM is feature graphs C4 'and C5' extracted by a trunk feature extraction network, and the output of the MRFFEM is subjected to GSCSP-L to obtain C5;

2. The method for inspecting power transformation equipment of an unmanned aerial vehicle in a complex scene according to claim 1, wherein the long-distance characteristic capturing attention mechanism LDFC comprises the following steps: firstly, an average pooling layer, a convolution operation of 1 multiplied by 1 and a linear activation function are sequentially adopted for an input characteristic diagram Z to obtain the characteristic diagramAnd->Then, the feature map->The convolution input to the vertical adaptive pooling layer, the central feature aggregation layer and 1×1 captures the long-distance spatial features in the vertical directionFeature map +.>The convolution of the adaptive pooling layer, the central feature aggregation layer and 1×1 input to the horizontal direction captures the long distance spatial feature in the horizontal direction>Finally, the +.>And->Conversion to global space-learnable weights alpha ^H (h) And alpha ^W (w)。

3. The method for inspecting power transformation equipment of an unmanned aerial vehicle in a complex scene according to claim 1, wherein the multi-level receptive field feature enhancement module MRFFEM comprises an auxiliary layer positioning feature enhancement block ALFEB and a deep semantic pyramid pooling block DSPPB; the ALFEB is provided with an input, three data processing branches and an output, wherein the input of the ALFEB is a characteristic graph C4', the data processing branch I of the ALFEB is a1×3 space-separable convolution, a 3×1 space-separable convolution, a space-separable convolution with a convolution kernel size of 3×03 expansion rate of 3 and a SimaM attention, the branch II is a1×13 space-separable convolution, a 3×21 space-separable convolution, a1×33 space-separable convolution, a 3×41 space-separable convolution, a convolution kernel size of 3×53 expansion convolution with an expansion rate of 5 and a SimaM attention, and the branch III is a1×3 space-separable convolution, a 3×1 space-separable convolution, a convolution kernel size of 3×3 expansion volume and a SimaM attention, wherein the space-separable convolution kernel size of 3×3 expansion rate of 7 are sequentially cascaded; finally, the outputs of the three data processing branches are aggregated by adopting splicing operation, and the output f is obtained after 1 multiplied by 1 convolution ₁ The method comprises the steps of carrying out a first treatment on the surface of the The DSPPB comprises an input, three data processing branches and an output, wherein the input of the DSPPB is a characteristic diagram C5', the first data processing branch of the DSPPB is a 1X 1 GSConv, a self-adaptive pooling layer, a convolution kernel which is a 3X 3 expansion rate 1 expansion volume and a SimAM attention which are sequentially cascaded, the second data processing branch is a 3X 3 GSConv, a self-adaptive pooling layer, a convolution kernel which is a 3X 3 expansion rate 3 expansion volume and a SimAM attention which are sequentially cascaded, and the third data processing branch is a two 3X 3 GSConv, a self-adaptive pooling layer and a convolution kernel which are sequentially cascaded, wherein the expansion volume and the SimAM attention which are 3X 3 expansion rates 5; finally, the output of the three data processing branches is adoptedThe splicing operation is aggregated, and the output f is obtained through 1 multiplied by 1 convolution ₂ The method comprises the steps of carrying out a first treatment on the surface of the Will f ₁ And f ₂ And (3) performing splicing operation, and performing 1×1 convolution to obtain a final output.

4. The method for inspecting power transformation equipment of an unmanned aerial vehicle in a complex scene according to claim 1, wherein the multi-scale feature aggregation network MFEN is composed of three stacked fusion and recombination feature aggregation blocks CARE and a class fusion and recombination feature aggregation block CARE; the input of the first CARE is feature graphs C5, C4 and C3, the feature graph N3 with the same scale as the feature graph C4 is obtained through downsampling of the C3, the feature graph L5 with the same scale as the feature graph C4 is obtained through upsampling of the C5, the N3, the C4 and the L5 are spliced, and then the output P4_1 is obtained through GSCSP-L; the second CARE is input into P4_1, C3 and C2, the C2 is subjected to downsampling to obtain a feature diagram N2 with the same size as the C3, the P4_1 is subjected to upsampling to obtain a feature diagram L4_1 with the same size as the C3, the N2, the C3 and the L4_1 are spliced, and the output P3 is obtained through GSCSP-L; the third CARE is input into C5, P4_1 and P3, P3 is subjected to downsampling to obtain a feature diagram N3_1 with the same scale as that of P4_1, C5 is subjected to upsampling to obtain a feature diagram L5_1 with the same scale as that of P4_1, N3_1, P4_1 and L5_1 are spliced, and then GSCSP-L is carried out to obtain an output P4; the SCARE is input into C5 and P4, the C5 obtains a feature diagram S5 with the same size as the C5 through a feature auxiliary module AFSB, the P4 obtains a feature diagram N4 with the same size as the C5 through downsampling, the S5, the N4 and the C5 are spliced, and the output P5 is obtained through GSCSP-L; and finally, P3, P4 and P5 are respectively output to a first detection head, a second detection head and a third detection head of the detection head network.

5. The unmanned aerial vehicle inspection substation equipment method under a complex scene according to claim 1, wherein the detection Head module network comprises a first detection Head yolo_head1, a second detection Head yolo_head2 and a third detection Head yolo_head3; the first detection Head yolo_head1 detects the characteristic diagram with the size of 80 multiplied by 80 to obtain a detection result of the power transformation equipment; the second detection Head yolo_head2 detects the characteristic diagram with the size of 40 multiplied by 40 to obtain a detection result of the power transformation equipment; and the third detection Head yolo_head3 detects the feature map with the size of 20 multiplied by 20 to obtain the detection result of the power transformation equipment.

6. The method for inspecting power transformation equipment of unmanned aerial vehicle under complex scene according to claim 4, wherein the characteristic auxiliary module AFSB consists of a main branch and a residual branch; the main branch comprises two branches, wherein the first branch is a convolution and Softmax function of 1X 1 which are sequentially cascaded, the second branch is a self-adaptive pooling layer, the outputs of the two branches are multiplied, and then the outputs of the main branch are obtained through the convolution of 1X 1, layerNorm and an activation function ReLU; and finally, splicing the output of the main branch with the residual branch to obtain the output of the AFSB.

7. A method of inspecting a power transformation device for a drone in a complex scenario comprising a microprocessor and a memory, which are interconnected, characterized in that the microprocessor is programmed or configured to perform the steps of the method of inspecting a power transformation device for a drone in a complex scenario according to any one of claims 1 to 6.

8. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program for execution by a computer device to implement the steps of the method for inspecting a power transformation device for a drone in a complex scenario according to any one of claims 1 to 6.