CN117115432A

CN117115432A - Defect detection method and device for distribution line, electronic equipment and medium

Info

Publication number: CN117115432A
Application number: CN202311380747.XA
Authority: CN
Inventors: 张志勇; 艾坤; 刘海峰
Original assignee: Hefei Zhongke Leinao Intelligent Technology Co ltd
Current assignee: Hefei Zhongke Leinao Intelligent Technology Co ltd
Priority date: 2023-10-24
Filing date: 2023-10-24
Publication date: 2023-11-24

Abstract

The invention discloses a defect detection method, a device, electronic equipment and a medium for a distribution line, wherein the method comprises the following steps: acquiring a preprocessed distribution line picture; extracting features of the preprocessed distribution line pictures to obtain a first feature map; inputting the first feature map into an encoding layer of a target detection model to obtain a second feature map and a first position code; inputting the second feature map and the first position code to a decoding layer of the target detection model to obtain a plurality of detection results, wherein the decoding layer of the target detection model adopts a mode of model learning random initialized content query vectors and updating matching query vectors in a mode of denoising query vector auxiliary training to train, and removes the denoising query vectors after model training is completed; and determining a defect detection result of the distribution line picture according to the detection results. By adopting the method, the training efficiency and the model performance of the target detection model can be improved.

Description

Defect detection method and device for distribution line, electronic equipment and medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and apparatus for detecting defects of a distribution line, an electronic device, and a medium.

Background

In the related art, a target detection (DETR, detection Transformer) model can be used to distinguish an image to be detected from a background in an image and predict the position and type of an object to be detected in the image. The implementation principle of the DETR model is simple, complex post-processing logic is not needed, and the detection result can be directly output, but the detection effect of the DETR model is poor, and the model training speed is low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a method, an apparatus, an electronic device, and a medium for detecting defects of a distribution line, which can solve the problems of poor image detection effect and slow training speed of the DETR model.

A method of fault detection for a distribution line, comprising the steps of:

acquiring a preprocessed distribution line picture;

extracting features of the preprocessed distribution line pictures to obtain a first feature map;

inputting the first feature map into an encoding layer of a target detection model to obtain a second feature map and a first position code;

Inputting the second feature map and the first position code to a decoding layer of the target detection model to obtain a plurality of detection results, wherein the decoding layer of the target detection model adopts a mode of model learning random initialized content query vectors and updating matching query vectors in a mode of denoising query vector auxiliary training to train, and removes the denoising query vectors after model training is completed;

and determining a defect detection result of the distribution line picture according to the detection results.

A defect detection device for a distribution line, comprising:

the first acquisition module is used for acquiring the preprocessed distribution line pictures;

the second acquisition module is used for carrying out feature extraction on the preprocessed distribution line pictures to obtain a first feature map;

the third acquisition module is used for inputting the first characteristic diagram into the coding layer of the target detection model to obtain a second characteristic diagram and a first position code;

the detection module is used for inputting the second feature map and the first position code into a decoding layer of the target detection model to obtain a plurality of detection results, wherein the decoding layer of the target detection model is trained by adopting a mode of randomly initializing content query vectors through model learning and updating matching query vectors through a mode of denoising query vector auxiliary training, and the denoising query vectors are removed after model training is completed;

And the determining module is used for determining the defect detection result of the distribution line picture according to the detection results.

An electronic device comprising a memory storing a computer program and a processor implementing the steps of the above-mentioned method for detecting defects of a distribution line when the processor executes the computer program.

A computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described method of fault detection of a distribution line.

According to the defect detection method, device, electronic equipment and medium for the distribution line, the model is adopted to learn the randomly initialized content query vector and update the matched query vector in a denoising query vector auxiliary training mode for training, and the denoising query vector is removed after model training is completed, so that the target detection model can be converged more rapidly in the training process, the detection performance of the target detection model is improved, and the detection accuracy is improved.

Drawings

FIG. 1 is a schematic diagram of a target detection model;

FIG. 2 is a flow chart of a method of fault detection of a distribution line in one embodiment;

FIG. 3 is a flow diagram of training a detection model in one embodiment;

FIG. 4 is a flow chart of a training result output by a detection model according to an embodiment;

FIG. 5 is a flow diagram of a detection model processing input data in one embodiment;

fig. 6 is a block diagram of a defect detection device for a distribution line in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

Before explaining the embodiments of the present application in detail, a brief description is first given of a target detection method in the related art.

In the related art, the application of artificial intelligence in the image field is becoming more and more popular, the development of the target detection field in the image field is becoming more and more rapid, and the algorithm frameworks based on target detection, such as YOLO (You Only Look Once) series, fast R-CNN series, cornerNet series, etc., all have a common point, namely that the output result can be obtained into a correct output frame through complex post-processing logic, and cannot be processed end to end in a true sense.

On the basis, an end-to-end DETR model is provided, the implementation principle of the DETR model is very simple, complex post-processing logic is not needed, a result can be directly output, the implementation principle is that an image is input, features are extracted through a convolution module, three-dimensional features of the image can be converted into a one-dimensional sequence through 1*1 convolution, and the one-dimensional sequence is input to a coding layer in the DETR model. The coding layer of the DETR model consists of a plurality of sub-attention modules and a feedforward neural network, and a one-dimensional sequence passes through the coding layer consisting of a plurality of self-attention modules, then passes through the decoding layer consisting of a plurality of self-attention modules and a cross-attention module, and then passes through the feedforward neural network to obtain a final output result.

In the training process of the DETR model, query vectors in a decoding layer of the DETR model are updated by iterative training through random initialization, so that the training speed of the DETR model is low, and the target detection effect of the trained DETR model is poor.

Based on the above, the application provides a defect detection method, device, electronic equipment and medium for a distribution line, so as to improve the training efficiency of a DETR model and the detection effect of the DETR model, thereby being capable of more accurately identifying various defects of the distribution line.

Implementation details of the technical scheme of the embodiment of the present application are described in detail below.

The defect detection method of the distribution line provided by the application can be applied to the architecture diagram of the target detection model shown in fig. 1, and the defect detection method of the distribution line in the embodiment of the application is described in detail below with reference to the model structure diagram shown in fig. 1.

In one embodiment, as shown in fig. 2, a fault detection method for a distribution line is provided, and the fault detection method for the distribution line may include the following steps:

step S201, a preprocessed distribution line picture is acquired.

The distribution line pictures refer to distribution line pictures to be detected, and the pictures need to be input into a target detection model for defect detection.

The distribution line picture to be detected is obtained, the distribution line picture is preprocessed, useful information in the image can be extracted and enhanced, better input is provided for subsequent image analysis, and therefore a detection result with higher accuracy can be obtained. The preprocessing operation for the distribution line picture may include:

image scaling: adjusting the distribution line picture to a fixed size, and generally scaling the distribution line picture to an input size required by a target detection model;

Image normalization: carrying out normalization processing on the pixel value of the distribution line picture to enable the pixel value to fall within a specific range, wherein the normalization method can adopt a mode of dividing the pixel value of the distribution line picture by the width and height values of the distribution line picture;

image channel order: according to the requirement of the target detection model, the channel sequence of the distribution line pictures is adjusted, and the common channel sequence is that the channel sequence of the distribution line pictures is adjusted from RGB to BGR.

In practical applications, the preprocessing operation of the distribution line pictures is not limited, and the preprocessing operation of the distribution line pictures can be completed by using functions provided by an image processing library or a deep learning framework.

The purpose of preprocessing the distribution line pictures is to convert the distribution line pictures into a format meeting the input requirements of the target detection model, so that the distribution line pictures are input into the target detection model for defect detection tasks.

Step S202, feature extraction is carried out on the preprocessed distribution line pictures, and a first feature map is obtained.

The feature extraction is performed on the preprocessed distribution line picture by using the convolution module, and in practical application, the preprocessed distribution line picture can be input into the ResNet-50 network to obtain the first feature map.

The ResNet-50 network has 50 layers, including 49 convolution layers and a full connection layer, and can be specifically divided into seven parts, wherein the first part does not contain residual blocks, and mainly carries out convolution, regularization, activation function and maximum pooling calculation on the input distribution line picture. The second, third, fourth and fifth part structures all comprise residual blocks, and the residual blocks all have three-layer convolution. The corresponding output is obtained through convolution calculation of the first five parts, and the pooling layer can convert the output obtained through convolution calculation of the first five parts into a corresponding feature map, so that a first feature map is obtained.

Step S203, inputting the first feature map into the coding layer of the target detection model to obtain a second feature map and a first position code.

Here, the obtained first feature map is input to a target detection model, wherein the target detection model includes an encoding layer and a decoding layer, and the input first feature map is first processed by the encoding layer of the target detection model.

In practical application, the coding layer of the target detection model is composed of a multi-head self-attention module and a feedforward neural network, the first feature map is processed by the self-attention module, the self-attention module utilizes an attention mechanism to calculate the correlation between the first feature maps so as to better capture global context information, the feedforward neural network is further processed by the feedforward neural network, the feedforward neural network is composed of a plurality of fully connected layers and is used for carrying out nonlinear transformation and feature mapping on the first feature map, a new second feature map is obtained, and therefore the target detection model can carry out higher-level feature extraction and representation on the input first feature map in the coding layer.

In the process of processing the first feature images by the coding layer of the target detection model, the first feature images are flattened into a feature vector sequence, which leads to that the extracted second feature images lose the spatial information of the original distribution line images, and the position relationship of the targets is critical to the target detection task, so that in order to help the target detection model understand the position relationship between the second feature images, the second feature images also need to be subjected to position coding in the coding layer of the target detection model, and the first position codes corresponding to the second feature images are obtained.

The position coding is a process of introducing position information for the second feature map, and by adding position information for the second feature map, the object detection model can know the distance and the relative position relation between different positions. In practical application, the position codes are generated through sine functions and cosine functions, so that unique codes can be provided for feature vectors at different positions in the second feature map, and meanwhile, certain periodicity is maintained, so that the target detection model can capture the relation between different positions.

Step S204, the second feature map and the first position codes are input to a decoding layer of the target detection model, and a plurality of detection results are obtained.

Here, the decoding layer of the object detection model is a content query vector randomly initialized using model learningAnd training in a mode of updating a matching Query vector (Match Query) by a mode of denoising Query vector (Denoise Query) auxiliary training, wherein the Match Query consists of N preset prediction frames (anchors) and corresponding class labels, and the Denoise Query consists of generated M noisy real target frames and corresponding class labels.

It should be noted that, the anchor in the Match Query is obtained by setting according to a priori experience, and is a bounding box for predicting the position and class of each target in the target detection task; the real target frame is a position frame of the real defect marked by the dataset for model training.

And a Match Query is introduced in the training process of the target detection model, and is endowed with a preset anchor and class label, so that compared with the Query initialized at random, the target detection model can be converged more quickly, and the overall detection performance of the target detection model is improved. Meanwhile, a Denoise Query is introduced, the introduced Denoise Query contains a real target frame and a corresponding class label, and the model can be known to converge more rapidly in the training stage of the target detection model, so that the model convergence can be quickened by introducing the Match Query and the Denoise Query in the training process of the target detection model, the training efficiency of the target detection model is improved, and meanwhile, the detection performance of the target detection model is also improved.

During model training, the decoding layer contains random initializationThe decoding layer passes through p->The Match Query and Denoise Query learn the input training data, and the training data is input to the Match Query and Denoise Query>And performing iterative update on the Match Query, and completing training of the model after the model is subjected to repeated iterative training.

In this embodiment, the Denoise Query participates in the calculation only in the training stage of the target detection model, and after the model training is completed, the Denoise Query is removed, that is, only the Match Query exists in the target detection model, and no Denoise Query is contained in the target detection model.

The decoding layer of the target detection model comprises four layers of self-attention modules, a cross-attention module and a plurality of feedforward neural networks, the decoding layer of the target detection model can process the input second feature images and the first position codes, specifically, the self-attention modules in the decoding layer process Match Query and content Query vectors, attention weights are calculated, output results of the self-attention modules, the second feature images and the first position codes are input into the cross-attention module for processing, and the cross-attention module is utilized to carry out cross matching on the features output by the Match Query and the coding layer so as to obtain accurate detection results. And mapping the interaction result of the second feature map and the Match Query to the detection frame and the defect category by using a multi-layer perceptron (MLP), thereby obtaining N groups of detection results, wherein the number N of groups of detection results is consistent with the number N of anchors in the Match Query.

Step S205, determining a defect detection result of the distribution line picture according to the detection results.

In practical application, the detection results further comprise a confidence coefficient corresponding to each detection frame, and after the plurality of detection results are determined, the detection frames and defect types, which are larger than the confidence coefficient threshold, in the plurality of detection results can be output according to the set confidence coefficient threshold, so that the defect detection results of the distribution line picture can be obtained.

The training process of the object detection model used in the fault detection method of the distribution line in the present application will be described in detail by different embodiments.

Fig. 3 shows a schematic flow chart of training a detection model.

Step S301, marking a defect target on the data set, and extracting features of the marked data set to obtain a third feature map.

In the model training phase, the training of the target detection model needs a data set containing labels, and the data set comprises information such as distribution line pictures and the positions, the categories and the like of corresponding defect targets, so that the data set for training the target detection model needs to be manufactured before the model training is started.

In practical application, the data set is manufactured by collecting different distribution line pictures and marking the collected distribution line pictures. Distribution line pictures can be acquired through different approaches, such as using an unmanned aerial vehicle, installing a camera, or from an existing image library, and it is necessary to ensure that the acquired distribution line pictures cover various distribution lines and possible defect types.

After the distribution line pictures are acquired, preprocessing the acquired distribution line pictures, wherein the preprocessing operation may include:

After preprocessing the distribution line pictures, manually or semi-automatically marking the positions and defect types of the defects aiming at each preprocessed distribution line picture, wherein marking can be completed through a professional marking tool or platform, and marking accuracy and consistency are ensured.

In practical applications, the size of the data set may also affect the performance of the model, and a larger data set is helpful to improve the generalization capability of the model, based on which, the collected distribution line picture may also be processed by a data enhancement mode, where the data enhancement mode may include:

(1) And (3) horizontally overturning: the distribution line images are reversed along the vertical central axis, different placement of objects in the horizontal direction can be simulated, sample diversity is increased, and the model is facilitated to better understand different visual angles of the objects.

(2) Turning left and right: the distribution line images are turned over along the horizontal central axis, different placement of objects in the vertical direction can be simulated, and sample diversity is increased.

(3) Color tone change: the color appearance of the distribution line image is changed by adjusting the parameters such as brightness, contrast, saturation and the like of the distribution line image, so that the model is better adapted to different illumination conditions and environments.

(4) Rotation enhancement: randomly rotating the distribution line image through a certain angle (typically between 0 and 360 degrees) or by rotating it fixedly through several fixed angles can simulate the observation of an object at different angles.

The acquired distribution line pictures are processed in a data enhancement mode, so that additional training samples can be generated, the size of a training set is expanded, the overfitting risk of the model is reduced, the model can be better generalized to unseen data, the model can be better adapted to different data sources and environments, and the performance of the model is improved.

After the labeling of the dataset is completed, the dataset is input to a convolutional layer (e.g., resNet-50 network) for feature extraction, so that a third feature map can be obtained.

In one embodiment, the defects on the distribution line pictures in the data set can be marked in a marking mode, specifically, the rectangular frame at the position of the defects in the distribution line pictures is used by marking tools, the position and the size of the rectangular frame can be adjusted according to requirements, and the rectangular frame can cover the region where the defects are located. In addition, a corresponding class label can be added to the rectangular box for describing the defect class. In practical application, a corresponding position label can be added for the rectangular frame, and the label format is the coordinates of the central point, the width and the height of the rectangle.

Step S302, inputting the third feature map to the coding layer in the target detection model to obtain a fourth feature map and a second position code.

Here, the extracted third feature map is input to the encoding layer of the object detection model, and the encoder is composed of a self-attention module and a feedforward neural network. The third characteristic diagram is input to the coding layer, and after the characteristics are extracted through the feedforward neural network through multi-layer self-attention calculation, a fourth characteristic diagram can be obtained.

And after the fourth feature map is extracted, carrying out position coding on the fourth feature map, so as to obtain a second position code corresponding to the fourth feature map.

It will be appreciated that in the object detection task, in addition to identifying the class of objects, it is also necessary to determine the position of the object in the image, the primary purpose of the position coding being to enable the model to understand and take into account the position information of the object in order to accurately locate the object. Position coding is typically achieved by embedding position information in the fourth feature map. These position-coding vectors are learned parameters that contain information about the position so that the model can focus on features at different positions in the image based on these encodings.

In practical application, a common position coding method is to use a combination of sine and cosine functions, and the coding method can ensure that codes of different positions have a certain distance in a vector space, so that a model can identify a position relation through the position codes.

Step S303, the fourth feature map and the second position code are input to the decoding layer in the target detection model to train the decoding layer in the target detection model.

Here, the decoding layer of the object detection model is composed of four layers of self-attention modules, cross-attention modules, and feedforward neural networks. In the training process of the target detection model, a decoding layer of the target detection model comprises a Match Query and a Denoise Query, wherein the Match Query consists of N preset anchors and corresponding class labels, and the Denoise Query consists of M generated noisy real target frames and corresponding class labels.

In practical application, the Match Query and the Denoise Query of the target detection model play a role in guiding the model to perform target detection, and the target detection model can predict the type and the position of the detection frame according to the information of the Match Query and the Denoise Query by correlating with the features in the image. Thereby obtaining the final prediction result.

In the present embodiment, the fourth feature map and the second position code are input into the decoding layer of the object detection model by initializing at randomAnd processing the fourth feature map, the second position code, the Match Query and the Denoise Query to realize the training of the decoding layer. In practice, the training of the decoding layer is essentially to +.>And matching the query vector by training +.>And matching the continuous iterative configuration of the query vector, so that the performance of the target detection model meets the set training requirement, namely, the target detection model can keep higher detection precision.

In one embodiment, after the labeling of the dataset is completed, a Denoise Query can be generated depending on the labeling result of the dataset, and M noisy real target frames and class labels can be generated by performing dithering processing on the real target frames and class labels in the labeling result of the dataset, so as to obtain the Denoise Query, wherein the noisy real target frames can be obtained by shifting the center point of the real target frames and scaling the width and height, and the noisy label classes can be obtained by processing the original class labels in a random flip manner, and the specific processing procedure is as follows:

wherein,and->Representing the real target frame coordinates at +.>Direction and->Offset of direction, +.>Representing the width of the real target frame, +.>Representing the height of the real target box, +.>And->For custom superparameter, the numerical range is [0,1]，/>Representing the width of the real object frame after dithering, < >>Representing the height of the real object frame after dithering, < >>Representing the label value after dithering +.>Original tag value representing real target box, +.>The jitter pattern of the label is represented by random inversion, +.>Each Query composition in the Denoise Query is represented.

Fig. 4 shows a flow chart of processing training results output by the detection model.

In step S401, a first defect prediction result generated by the target detection model performing attention calculation based on the matching query vector, the fourth feature map and the second position code, and a second defect prediction result generated by performing attention calculation based on the denoising query vector, the fourth feature map and the second position code are obtained.

Step S402, a first loss function is calculated according to the first defect prediction result and the labeling result of the data set, and a second loss function is calculated according to the second defect prediction result and the labeling result of the data set.

Step S403, determining whether to continue iterative training on the target detection model according to the first loss function and the second loss function.

Here, the calculation of a loss function is also involved in the model training process, and the loss function formula is used as a function for measuring the difference between the model prediction result and the real label.

After the fourth feature map and the second position code are input to a decoding layer of the target detection model, the decoding layer performs attention calculation on the fourth feature map and the second position code by combining the Match Query and the Denoise Query, so that a corresponding defect prediction result is obtained.

In a model training stage, a defect prediction result output by a target detection model is divided into a first defect prediction result and a second defect prediction result, wherein the first defect prediction result is obtained by a decoding layer through associating a Match Query with a fourth feature map and a second position code, the second defect prediction result is obtained by a decoding layer through associating a Denoise Query with the fourth feature map and the second position code, the first defect prediction result and the second defect prediction result comprise a prediction frame and a prediction label, the prediction frame indicates the position where the defect exists in prediction, and the prediction label indicates the type of the defect. Based on this, the loss function of the target detection model is equally divided into two parts, namely a first loss function and a second loss function, wherein the first loss function is a loss composed of Match Query, and the second loss function is a loss composed of Denoise Query.

The first loss function is calculated by the first defect detection result and the labeling result of the data set, so that the difference between the defect prediction result and the real result can be determined. The calculation process of the first loss function is as follows:

firstly, matching a first defect prediction result with a labeling result of a data set through a Hungary algorithm, and calculating each matched object N represents the output N first defect detection results,>representing the corresponding labeling result, < >>Representing the first defect detection result,/o>Calculation mode representing hungarian match, +.>Indicate class when tag matches->Predicted as +.>Probability of category->Indicate class when tag matches->Loss of predicted and real target frames when not empty, +.>Representing a real target frame->Representing the prediction box. />Representing a loss function for the matched result, < +.>Representing the result of matching according to the hungarian algorithm, predicted as +.>Category loss of category->Representing the result of matching according to the hungarian algorithm, predictive +.>Prediction frames whose class is not empty are lost. />For prediction frame loss calculation, parameter +.>And->Is a superparameter for normalizing L1 penalty and IOU penalty, < >>Is an IOU penalty.

The second loss function is calculated by the second defect detection result and the labeling result of the data set, so that the difference between the defect prediction result and the real result can be determined. The calculation process of the second loss function is as follows:

wherein,representing a second loss function, P representing the division of Denoise Query into P groups, M representing the division of Denoise Query into M outputs per group, < > >Representing the probability of the prediction category of the jth output of the ith group in the second defect prediction result,/for the ith group>Representing the j-th real object box of the i-th group,>a prediction box representing the jth output of the ith group of second defect prediction results.

The loss function is used to optimize parameters of the model, and by minimizing the loss function, the target detection model can adjust the parameters to bring the predicted outcome closer to the true outcome. In addition, the loss function can also be used for evaluating the performance of the target detection model, in the training process, the value of the loss function can reflect the fitting degree of the model on training data, and a lower loss function value generally indicates that the fitting effect of the model on the training data is good.

By calculating the first loss function and the second loss function, the performance of the target detection model after training and whether parameters of the target detection model need to be adjusted can be determined, so that whether the performance of the target detection model meets the set performance requirement or not is determined, and the detection performance with higher accuracy can be achieved. Under the condition that the calculated first loss function and the calculated second loss function represent that the performance of the target detection model cannot meet the set performance requirement, the target detection model needs to be continuously iteratively trained.

In one embodiment, as shown in FIG. 5, FIG. 5 shows a schematic flow diagram of the detection model processing the input data.

In step S501, an attention mask matrix is created.

In step S502, the attention weight matrix is processed with the attention mask matrix to calculate self-attention.

An Attention Mask (Attention Mask) is also provided in the self-Attention module to prevent information leakage. It can be understood that in the process of performing attention weight calculation, since the Denoise Query part contains information of a real target frame and a category label, the Match Query focuses on the Denoise Query, so that the real information is leaked, and based on this, in the process of training, the focus range of the Match Query needs to be limited, so that the Match Query ignores the Denoise Query. However, since Denoise Query contains the most real target boxes and class labels, even if the scope of interest of Denoise Query is not limited, the result is not greatly affected.

In practical applications, the Attention Mask is a matrix in which a value at a certain position may be 0 or 1, so that only the information about the position should be considered by the model in self-Attention calculation, if the value at the position is 0, the model will pay Attention to the information about the position in self-Attention calculation; if 1, the model will ignore the location. The specific calculation process of the Attention Mask is as follows:

Wherein,representing the Attention Mask, < ->Values representing each element of the Attention Mask, +.>Representing the size of the Attention Mask matrix,/->Indicates how many groups there are Denoise Query, +.>Represents the number of Query contained in each group of Denoise Query, < ->The number of Query units representing the Match Query is +.>=1 means that the ith Query cannot see the jth Query, when +.>Meaning that the ith Query can see the jth Query.

In the self-attention module, the calculation of self-attention can be divided into three steps: the linear transformation of the query, key and value is computed, the attention weights are computed, and the weighted summation yields the final attention representation.

First, the query, key and value are linearly changed to obtain a content query vector, a key vector and a value vector.

Next, an attention weight is calculated. The calculation of the attention weight may be achieved by multiplying the query vector by the transpose of the key vector, then performing a scaling operation, and finally applying a softmax function. The resulting attention weight matrix can be expressed as:wherein->Is super-ginseng, and the default value is 512.

And finally, multiplying the attention weight matrix by the value vector matrix to obtain a weighted summation result.

In calculating the attention weight, the attention mask matrix may be multiplied by the attention weight matrix to mask invalid locations. A 1 in the attention mask matrix indicates an invalid position, and a 0 indicates a valid position. After multiplying the attention mask matrix by the attention weight matrix, performing a softmax operation, and setting the attention weight of the invalid position to 0, so that the value of the invalid position is ignored in weighted summation, wherein the attention weight matrix processed by the attention mask matrix can be expressed as: Wherein A represents an Attention Mask matrix.

The processing of input data by the decoding layer of the object detection model during the training phase is described below in connection with the architecture diagram of the object detection model of fig. 1.

In the training process, the decoding layer performs position coding on the Match Query and the Denoise Query, and in this embodiment, a sine function is used to perform position coding on the Match Query and the Denoise Query, so that corresponding position information is added for each position in the Match Query and the Denoise Query. Specifically, the generation process of the position code is as follows:

wherein,representing position coding calculations->Consists of Match Query and Denoise Query during training, and is->Is super-ginseng with default value of 512 #>Representation->In the sequence of Query, i.e., each Query has an independent position code.

After the position coding calculation is completed, the obtainedIs input into the full connection layer (MLP, fully Connected Layer), and can output +.>Wherein->。

Output of full connection layerAnd content query vector->Performing addition processing to obtain first input +.>And a second input +.>Wherein->，/>. It should be noted that->The initial value of (a) is a random value, and by training the model, the +. >The value of (2) is adjusted, i.e.)>Is a parameter to be learned.

The self-attention module also has a third inputWherein->. Thus, the first input +.>A second inputAnd a third input->Input into the self-attention module, after processing by the self-attention module, output can be obtainedWherein->Representing an Attention Mask matrix.

Output of self-attention moduleAnd->Input to the addition normalization module (Add&Norm), wherein Add&Norms consist of two main components: addition operation and normalization operation. In the summation operation, the two inputs are added together to form a summation result. The purpose of this operation is to allow information to be transferred in the network, especially in deep networks, helping to solve the problems of gradient extinction and gradient explosion. In the normalization operation, the addition results are normalized to reduce the effect of internal covariate transfer (internal covariate shift). The normalization operation may be batch normalizationAnd techniques such as normalization (Batch Normalization) or layer normalization (Layer Normalization) are used for normalizing the characteristic values of the addition result and improving the stability and training effect of the network.

At Add&After the Norm obtains the output result, add is utilized&Output result pair of NormUpdating, wherein +.>Wherein->Representing Add&Output of Norm, < >>Representing the normalization of the features.

Next, the fourth feature map and the second position code in the input decoding layer are processed by a cross-attention module in the decoding layer. Specifically, the cross-attention module has three inputs, respectively a fourth inputFifth inputAnd a sixth input->。

For the fourth inputFirst, will +.>Inputting into MLP for processing to obtain corresponding output +.>The output of MLP is encoded with position +.>Dot multiplication is performed to obtain +.>. Point multiplier result and->Performing splicing treatment to obtain fourth input +.>Can be expressed as->Wherein->Representing stitching of the features of the input. It is to be noted here +.>Refers to updating via self-attention module +.>。

For the fifth inputIs obtained by splicing the fourth feature map and the second position code, in particular, < >>Wherein->Features representing the output of the coding layer, i.e. the fourth feature map, ">Representing the position code in the coding layer, i.e. the second position code.

For the sixth inputSixth input->Essentially +.>I.e. +.>。

Based on this, it will be the fourth inputFifth input->And a sixth input->Input to the cross attention module, and the corresponding output +.>Wherein the process of the cross-attention module can be expressed as: />。

Outputs of the cross-attention moduleAnd updated->Input to Add&In Norm, the corresponding output +.>。

Add&Output of NormInput to feedforward neural networkFNN, feedforward Neural Network) is used for tasks such as pattern recognition and function approximation, features are extracted and information processing is performed through combination and nonlinear transformation of multi-layer neurons, and the method has strong modeling capability and a wide application range. FNN +.>After the processing is completed, the output +.>。

And then the input and output of the FNN are transmitted to Add & Norm to obtain a sixth output result, specifically expressed as:

。

in the target detection model, four layers of such calculation cycles are arranged in total, and after the four layers of such calculation cycles are subjected to, the final output is performedThe defect prediction result obtained through a plurality of FNN layers comprises a prediction frame and a prediction type label.

In one embodiment, in deriving a layer of computation cyclesAfterwards, it is necessary to rely on->For->Andto perform an update, in particular, updated +.>. But for->Is updated by first +.>Input to MLP layer, update the output result obtained +.>Updated->。

It should be noted that, after the training of the target detection model is completed, the input distribution line picture is also processed by the target detection model through the above steps in the application process of the target detection model, which is different from the model training stage in that, in the model application stage, the decoding layer of the target detection modelOnly Match Query is contained, and no Denoise Query is contained.

In the embodiment, the target detection model is trained by adopting the pre-configured matching query vector and denoising query vector, so that the model training efficiency can be improved, and meanwhile, the performance of the target detection model is improved, so that the defects in the distribution line picture can be accurately identified.

In one embodiment, a detection device for a distribution line is provided, and referring to fig. 6, a detection device 600 for a distribution line may include: a first acquisition module 601, a second acquisition module 602, a third acquisition module 603, a detection module 604, and a determination module 605.

The first obtaining module 601 is configured to obtain a preprocessed distribution line picture; the second obtaining module 602 is configured to perform feature extraction on the preprocessed distribution line picture to obtain a first feature map; the third obtaining module 603 is configured to input the first feature map into an encoding layer of the target detection model, to obtain a second feature map and a first position code; the detection module 604 is configured to input the second feature map and the first position code to a decoding layer of the target detection model to obtain a plurality of detection results, where the decoding layer of the target detection model performs training by adopting a mode of model learning a randomly initialized content query vector and updating a matching query vector by a mode of denoising query vector auxiliary training, and removes the denoising query vector after model training is completed; the determining module 605 is configured to determine a defect detection result of the distribution line picture according to the plurality of detection results.

In one embodiment, when the decoding layer of the target detection model performs training by adopting a mode of model learning random initialized content query vectors and updating matching query vectors by a mode of denoising query vector auxiliary training, the detection module 604 is specifically configured to perform defect target labeling on a data set, and perform feature extraction on the data set with the labeling completed to obtain a third feature map; the data set comprises a plurality of distribution line pictures for model training; inputting the third feature map to an encoding layer in a target detection model to obtain a fourth feature map and a second position code; inputting the fourth feature map and the second position code to a decoding layer in the object detection model to train the decoding layer in the object detection model; the decoding layer comprises a randomly initialized content query vector, a pre-configured matching query vector and a denoising query vector.

In one embodiment, the detection module 604 is specifically configured to create an attention mask matrix when the fourth feature map and the second position code are input to the decoding layer in the object detection model to train the decoding layer in the object detection model; and processing the attention weight matrix by using the attention mask matrix to calculate self-attention, wherein the attention mask matrix is used for enabling the matching query vector to ignore the denoising query vector in the process of calculating self-attention so as to prevent information leakage.

In one embodiment, the detection module 604 is specifically configured to obtain a first defect prediction result generated by performing attention calculation based on the matching query vector, the fourth feature map, and the second position code by the target detection model, and a second defect prediction result generated by performing attention calculation based on the denoising query vector, the fourth feature map, and the second position code;

calculating a first loss function according to the first defect prediction result and the labeling result of the data set, and calculating a second loss function according to the second defect prediction result and the labeling result of the data set;

And determining whether iterative training of the target detection model is required to be continued or not according to the first loss function and the second loss function.

In one embodiment, the detection module 604 is specifically configured to perform a center point offset process on a rectangular frame for marking a defect target in the marking result of the dataset, and process a class label for marking the defect target in the marking result by adopting a randomly flipped dithering manner, so as to obtain the denoising query vector.

In one embodiment, the detection module 604 is specifically configured to perform position encoding on the matching query vector and the denoising query vector, and perform position encoding on the encoded matching query vector and the denoising query vectorInput into the MLP layer of the multi-layer perceptron to obtain a first output +.>；

Inputting a first inputSecond input->And a third input->Input to the self-attention module to obtain a second output +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>，/>，/>；/>Querying vectors for content;

for the content query vector input to the self-attention moduleAnd said second output->Performing residual connection and normalization, and updating the content query vector according to the obtained result>；

Will fourth input Q _cross Fifth input K _cross And a sixth input V _cross Input to the cross-attention module to obtain a third outputThe method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>、/>、；/>Representing the content query vector +.>Inputting the output result obtained in the MLP layer;representing the fourth feature map; />Representing the second position code; />Representing a splicing process;

query vectors for the content input to the cross-attention moduleAnd said third output->Carrying out residual connection and normalization to obtain a fourth output;

inputting the fourth output to a feedforward neural network FFN for processing to obtain a fifth output;

and carrying out residual connection and normalization on the fourth output and the fifth output to obtain a sixth output, and entering a next calculation cycle until four calculation cycles are completed, so as to obtain a defect prediction result.

In one embodiment, the detection module 604 is specifically configured to query the content query vector according to the sixth output resultUpdating;

and inputting the sixth output result into an MLP layer, and updating the matching query vector and the denoising query vector according to the output result.

For specific limitations of the defect detection device for the distribution line, reference may be made to the above limitation of the defect detection method for the distribution line, and the description thereof will not be repeated here. The above-described respective modules in the defect detection device for the distribution line may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

It should be noted that the logic and/or steps represented in the flowcharts or otherwise described herein, for example, may be considered as a ordered listing of executable instructions for implementing logical functions, and may be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims

1. A method of fault detection for a power distribution line, comprising:

acquiring a preprocessed distribution line picture;

2. The method of claim 1, wherein when the decoding layer of the target detection model is trained by means of model learning of randomly initialized content query vectors and updating matching query vectors by means of denoising query vector aided training, the method comprises:

performing defect target labeling on the data set, and performing feature extraction on the labeled data set to obtain a third feature map; the data set comprises a plurality of distribution line pictures for model training;

inputting the third feature map to an encoding layer in a target detection model to obtain a fourth feature map and a second position code;

inputting the fourth feature map and the second position code to a decoding layer in the object detection model to train the decoding layer in the object detection model; the decoding layer comprises a randomly initialized content query vector, a pre-configured matching query vector and a denoising query vector.

3. The method of claim 2, wherein, in said inputting the fourth feature map and the second position code to the decoding layer in the object detection model to train the decoding layer in the object detection model, the method comprises:

Creating an attention mask matrix;

and processing the attention weight matrix by using the attention mask matrix to calculate self-attention, wherein the attention mask matrix is used for enabling the matching query vector to ignore the denoising query vector in the process of calculating self-attention so as to prevent information leakage.

4. The method of claim 2, wherein the inputting the fourth feature map and the second position code to the decoding layer in the object detection model to train the decoding layer in the object detection model, the method comprising:

a first defect prediction result generated by the target detection model through attention calculation based on the matching query vector, the fourth feature map and the second position code and a second defect prediction result generated by the target detection model through attention calculation based on the denoising query vector, the fourth feature map and the second position code are obtained;

5. The method of claim 2, comprising, prior to said inputting the fourth feature map and the second position code to a decoding layer in the object detection model to train the decoding layer in the object detection model:

and performing center point offset processing on a rectangular frame for marking a defect target in the marking result of the data set, and processing a class label for marking the defect target in the marking result by adopting a randomly-turned dithering mode to obtain the denoising query vector.

6. The method of claim 2, wherein the inputting the fourth feature map and the second position code to the decoding layer in the object detection model to train the decoding layer in the object detection model comprises:

performing position coding on the matching query vector and the denoising query vector, and performing position coding on the coded matching query vector and the denoising query vectorInput into the MLP layer of the multi-layer perceptron to obtain a first output +. >；

Inputting a first inputSecond input->And a third input->Input into the self-attention module to obtain a second outputThe method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>，/>，/>；/>Querying vectors for content;

for the inner input to the self-attention moduleCapacity query vectorAnd said second output->Performing residual connection and normalization, and updating the content query vector according to the obtained result>；

Will fourth input Q _cross Fifth input K _cross And a sixth input V _cross Input to the cross-attention module to obtain a third outputThe method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>、/>、/>；/>Representing the content query vector +.>Inputting the output result obtained in the MLP layer;representing the fourth feature map; />Representing the second position code; />Representing a splicing process;

7. The method of claim 6, wherein prior to entering the next calculation cycle, the method further comprises:

querying the content query vector according to the sixth output resultUpdating;

8. A defect detection device for a distribution line, comprising:

9. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 7.