CN116128965A

CN116128965A - Power cable position detection method based on VIT, electronic equipment and medium

Info

Publication number: CN116128965A
Application number: CN202310146631.3A
Authority: CN
Inventors: 李东有; 王志强; 孙琰; 陈志忠; 雷思宇; 于洪涛; 张航; 李刚; 耿建宇; 闫旭; 李雪峰; 杨钧砚; 杜伟; 姜泓杉; 杜英杰; 张凯; 何昊; 韩冬
Original assignee: Changchun Power Supply Co Of State Grid Jilinsheng Electric Power Supply Co
Current assignee: Changchun Power Supply Co Of State Grid Jilinsheng Electric Power Supply Co
Priority date: 2023-02-21
Filing date: 2023-02-21
Publication date: 2023-05-16

Abstract

The embodiment of the application discloses a power cable position detection method, electronic equipment and medium based on VIT, which can accurately detect the position of a high-altitude power cable by utilizing a pre-trained VIT detection model, and has higher robustness and stronger generalization capability in the power cable detection under outdoor scenes because the VIT detection model has a attention mechanism and does not need priori knowledge, global modeling capability is ensured through a transform coding and decoding structure, one-to-one correspondence between prediction and manual labeling is ensured through bipartite graph matching optimization, and target detection frames and category attributes can be directly obtained.

Description

Power cable position detection method based on VIT, electronic equipment and medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a power cable position detection method based on VIT, electronic equipment and a medium.

Background

With the rapid development of high and new technology and the application of the high and new technology in the ground in various industries, the high-altitude power operation is gradually changed from manual operation to automatic operation and intelligent operation, so that the requirements of reducing staff, efficiency and risks of power grid companies are met. The intelligent operation needs to accurately detect the position of the high-altitude power cable, and the detection result can further assist the camera in three-dimensional positioning.

However, due to the fact that the illumination changes in outdoor scenes are variable, interference such as reflection exists in various metal products near an overhead line, the conventional visual detection method based on geometric characteristics cannot cope with such complex situations, the deep learning detection method based on CNN (Convolutional Neural Networks, convolutional neural network) is usually too dependent on certain local information, so that certain degree of unreliability is caused, and the mainstream target detection algorithm, such as Faster-RCNN (Faster-Regions with CNN features, fast regional convolutional neural network) or YOLO (You Only Look Once) series algorithm, is more complex in engineering realization due to the fact that prior anchors and nms (non maximum suppression, not very large inhibition) are required to be set, and the ultra-parameters required to be trained are more.

Disclosure of Invention

The embodiment of the application provides a power cable position detection method, electronic equipment and medium based on VIT, which can accurately detect the position of a power cable under high altitude.

In a first aspect, an embodiment of the present application provides a method for detecting a position of a power cable based on VIT, where the method for detecting a position of a power cable based on VIT includes:

acquiring a power cable image and a pre-trained VIT detection model;

extracting features of the power cable image by using a skeleton network of the VIT detection model to obtain target image features of the power cable image;

performing position coding on the target image characteristics to obtain target position coding vectors;

encoding the target image feature and the target position encoding vector by utilizing an encoder structure of the VIT detection model to obtain a target encoding feature;

decoding the target coding feature by using a decoder structure and object queries of the VIT detection model to obtain a target decoding feature;

and inputting the target decoding characteristics to a classification and regression head module of the VIT detection model to obtain a cable prediction position in the power cable image.

In a second aspect, embodiments of the present application further provide a device for detecting a position of a power cable based on VIT, where the device for detecting a position of a power cable based on VIT includes:

the acquisition unit is used for acquiring the power cable image and acquiring a pre-trained VIT detection model;

the extraction unit is used for extracting the characteristics of the power cable image by utilizing the skeleton network of the VIT detection model to obtain the target image characteristics of the power cable image;

the encoding unit is used for carrying out position encoding on the target image characteristics to obtain a target position encoding vector;

the coding unit is further configured to code the target image feature and the target position coding vector by using an encoder structure of the VIT detection model to obtain a target coding feature;

the decoding unit is used for decoding the target coding feature by utilizing the decoder structure and the object queries of the VIT detection model to obtain a target decoding feature;

and the prediction unit is used for inputting the target decoding characteristics to a classification and regression head module of the VIT detection model to obtain the cable prediction position in the power cable image.

In a third aspect, embodiments of the present application also provide an electronic device, including a memory storing at least one instruction; and a processor executing the instructions stored in the memory to implement the above method.

In a fourth aspect, embodiments of the present application also provide a computer-readable storage medium having at least one instruction stored therein, the at least one instruction being executed by a processor in an electronic device to implement the above-described method.

The embodiment of the application provides a power cable position detection method based on VIT, electronic equipment and a medium. Has the following beneficial effects: the position of the high-altitude power cable is accurately detected by utilizing a pre-trained VIT detection model, and as the VIT detection model is provided with a attention mechanism, the global modeling capability is ensured through a transducer coding and decoding structure without priori knowledge, the one-to-one correspondence between prediction and manual labeling is ensured through bipartite graph matching optimization, the target detection frame and the category attribute can be directly obtained, and the high-altitude power cable detection method has higher robustness and stronger generalization capability in the process of coping with the power cable detection in an outdoor scene.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a method for detecting a position of a power cable based on VIT according to an embodiment of the present application;

fig. 2 is a schematic diagram of a functional module of a VIT-based power cable position detecting device according to an embodiment of the present application;

fig. 3 is a schematic block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

The embodiment of the application provides a method for detecting the position of a high-altitude power cable based on a VIT (Vision transducer), electronic equipment and a medium, wherein the method can be applied to a device for detecting the position of the power cable based on the VIT.

The execution subject of the VIT-based power cable position detection method may be the VIT-based power cable position detection device provided in the embodiments of the present application, or an electronic device integrated with the VIT-based power cable position detection device, where the VIT-based power cable position detection device may be implemented in a hardware or software manner, and the electronic device may be a detector or the like, and in particular may be a control device in the detector, or a terminal or a server having a communication connection with the control device.

Referring to fig. 1, fig. 1 is a schematic flow chart of a method for detecting a position of a power cable based on VIT according to an embodiment of the present application. The power cable position detection method based on the VIT can be applied to a power cable position detection device based on the VIT. Of course, in other embodiments, the VIT-based power cable position detection method may also be applied to other types of detection devices, which is not limited in this application.

Specifically, the VIT-based power cable position detection method includes the following steps S110 to S160.

S110, acquiring a power cable image and acquiring a pre-trained VIT detection model.

In this embodiment, the power cable image may be acquired by a camera or camera device mounted at a designated location, such as by a camera mounted on a utility pole.

In this embodiment, the VIT detection model may be obtained by training through a VIT network.

Specifically, before acquiring the pre-trained VIT detection model, the method further includes:

acquiring cable RGB (red, green, and blue) images under different environmental conditions by using an image acquisition device, constructing an image set to be annotated, and acquiring an initial VIT model and a pre-constructed loss function;

labeling each image to be labeled in the image set to be labeled to obtain a sample image set;

training the initial VIT model by using the sample image set, and detecting whether the loss function reaches convergence in the training process;

when the loss function is detected to reach convergence, stopping training to obtain the VIT detection model;

wherein the environmental conditions include light, background, and spatial location.

The labeling each image to be labeled in the image set to be labeled comprises the following steps:

and labeling each image to be labeled in the image set to be labeled at a preset distance from the damper. For example: each image to be marked in the image set to be marked can be marked at a position about 1.2m away from the damper.

Wherein said detecting whether the loss function reaches convergence during training comprises:

acquiring unordered sets output by a model during each training round, and extracting a plurality of foreground categories and corresponding region-of-interest frames from the unordered sets; acquiring a plurality of mark categories and corresponding mark frames which have corresponding relations with the foreground categories and the corresponding interested region frames from the sample image set;

combining each foreground category and the corresponding region of interest frame into a group of characteristics to obtain a plurality of groups of output characteristics; combining each marking category and the corresponding marking frame into a group of characteristics to obtain a plurality of groups of marking characteristics;

index matching is carried out on the multiple groups of output features and the multiple groups of marking features by using a bilateral matching algorithm, so that optimal binary matching features obtained by training in each round are obtained;

calculating the sum of the predicted category loss and the predicted boundary frame loss based on the optimal binary matching characteristics obtained by each round of training to obtain the target loss obtained by each round of training;

in the training process, when the target loss is detected to be no longer reduced, determining that the loss function is detected to reach convergence;

and calculating the prediction category loss according to the foreground category with the highest prediction probability, wherein the boundary box prediction loss is a linear combination of the L1 loss and the GIOU loss of the prediction value of the region-of-interest box.

For example: 100 unordered sets output by the model during each training are obtained, and foreground categories and corresponding bbox coordinates are extracted from the 100 unordered sets. Setting y as a group trunk set, namely marking category and corresponding marking frame; yi= (Ci, bi) is a set of network predictions, where Ci represents the prediction type, bi represents the predicted bounding box bbox, where bi= (xi, yi, wi, hi), (xi, yi) represents the center point of bbox, wi represents the width of bbox, hi represents the height of bbox, and i is a positive integer. Further, an index matching loss between the predicted boundary box bbox and the marked bbox is solved by using a bilateral matching algorithm method, and the optimal binary matching characteristics obtained by each round of training are obtained. Further, a loss function is created on the basis of the optimal binary match:

wherein L represents the loss function; p (Ci) is argmax of the prediction type Ci, namely the prediction category loss, and represents the foreground category corresponding to the highest prediction probability; l (L) _box Representing the prediction loss corresponding to bbox, i.e. the boundary box prediction loss, if

The prediction loss corresponding to bbox is 0.L (L) _box Is a linear combination of L1 loss and GIOUs loss of bbox predictors, specifically:

L _box ＝a*L1+b*L _GIOU

wherein L1 represents the L1 loss, L _GIOU Indicating the GIOU loss, a and b are super parameters, a indicates the importance of the distance error, b indicates the importance of the area error, and a=b=0.5 may be configured.

The foreground type is matched in the training process through the configured loss function, and then the predicted boundary frame is matched, so that the training effect of the model is effectively improved, and the accuracy of the VIT detection model obtained through training is improved.

And S120, extracting the characteristics of the power cable image by utilizing the skeleton network of the VIT detection model to obtain the target image characteristics of the power cable image.

In this embodiment, the feature extraction of the power cable image by using the skeleton network of the VIT detection model, to obtain the target image feature of the power cable image includes:

scaling the power cable image to obtain a first characteristic;

determining the data volume of each batch of data, and constructing a second feature according to the data volume of each batch of data and the first feature;

inputting the second features into the skeleton network for feature extraction to obtain third features;

and performing dimension reduction processing on the third feature by using a 1*1 convolution layer to obtain the target image feature.

For example: and inputting the images (b, c, h_0, w_0) of the single Zhang Dianli cable images resize to (c, h_0, w_0), inputting the images (b, c, h_0, w_0) into a skeleton network for feature extraction, and carrying out 1*1 convolution on the extracted feature images to reduce dimensions and output shape= (b, d, h_1, w_1). Wherein c=3, representing the number of channels; h0=800, representing the picture height; w0=1200, representing the picture width; b=16, representing the number of pictures in one batch; h_ 1=h_0// 32, w1=w_0// 32, d=256, and the transformed picture has h_1×w_1 dimensions, each dimension being characterized by a vector of d dimensions.

Where b may be configured according to the GPU (Graphics Processing Unit, graphics processor) performance of the system.

S130, performing position coding on the target image characteristics to obtain target position coding vectors.

In this embodiment, the performing position encoding on the target image feature to obtain a target position encoding vector includes:

and encoding even bits of the target image features by using a sine function, and encoding odd bits of the target image features by using a cosine function to obtain the target position encoding vector.

In the above embodiment, the target image feature is extended from 1 dimension to 2 dimension, that is, the position encoding in both the horizontal axis and the vertical axis is performed, so as to obtain the target position encoding vector.

And S140, encoding the target image feature and the target position encoding vector by utilizing an encoder structure of the VIT detection model to obtain a target encoding feature.

In this embodiment, the encoder structure includes a plurality of encoders that are connected end to end. For example: the encoder structure may be formed of 6 encoders in series.

Specifically, the encoding the target image feature and the target position encoding vector by using the encoder structure of the VIT detection model, and obtaining the target encoding feature includes:

adding the target image features and the target position coding vector according to the bits to obtain a first input vector;

the first input vector is input to a first encoder in the encoder structure, and output data of a last encoder in the encoder structure is obtained as the target encoding feature.

In the above encoding process, the shape of the feature is unchanged.

S150, decoding the target coding feature by utilizing the decoder structure and the object queries of the VIT detection model to obtain a target decoding feature.

In this embodiment, the decoder structure includes a plurality of decoders connected end to end. For example: the decoder structure may be formed by 6 decoders in series.

Specifically, the decoding the target coding feature by using the decoder structure and the object queries of the VIT detection model to obtain a target decoding feature includes:

initializing based on object queries to obtain a query vector with all values of 0;

combining the query vector and the target coding feature to obtain a second input vector;

inputting the second input vector to a first decoder in the decoder structure and obtaining output data of a last decoder in the decoder structure as the target decoding feature.

For example: the object queries that can be learned are introduced in order to provide the relation between the target object and the global image, i.e. to initialize a query vector with shape (100, b, d) of all 0, where 100 denotes a total of 100 categories, decode and output in combination with the position-coded vector, and shape (6, b,100, d) output by the decoder. Where 6 denotes the output of 6 decoders, the latter decoder receiving the output of the previous decoder, and going forward.

S160, inputting the target decoding characteristics to a classification and regression head module of the VIT detection model to obtain the cable prediction position in the power cable image.

In this embodiment, the target decoding feature is input to a classification and regression head module of the VIT detection model for classification and regression processing, so as to obtain the cable prediction position in the power cable image.

According to the technical scheme, the method and the device are based on a deep learning strategy, and the method and the device are combined with a multi-head attention mechanism of a transducer, so that vision transformer is applied to high-altitude power cable target detection, and the similarity between local and global characterization of a power cable can be obtained from an image, so that the method and the device are suitable for complex outdoor environments, and the detection effect and detection precision of the cable are improved.

Fig. 2 is a schematic block diagram of a VIT-based power cable position detection device provided in an embodiment of the application. As shown in fig. 2, corresponding to the above power cable position detection method based on VIT, the present application further provides a power cable position detection device based on VIT. The VIT-based power cable position detection apparatus includes a unit for performing the above-described VIT-based power cable position detection method, and may be configured to a device such as a detector. Specifically, referring to fig. 2, the VIT-based power cable position detection apparatus 200 includes an acquisition unit 201, an extraction unit 202, an encoding unit 203, a decoding unit 204, and a prediction unit 205, wherein:

the acquiring unit 201 is configured to acquire an image of a power cable, and acquire a pre-trained VIT detection model;

the extracting unit 202 is configured to perform feature extraction on the power cable image by using a skeleton network of the VIT detection model, so as to obtain a target image feature of the power cable image;

the encoding unit 203 is configured to perform position encoding on the target image feature to obtain a target position encoding vector;

the encoding unit 203 is further configured to encode the target image feature and the target position encoding vector by using an encoder structure of the VIT detection model to obtain a target encoding feature;

the decoding unit 204 is configured to decode the target coding feature by using the decoder structure and the object queries of the VIT detection model to obtain a target decoding feature;

the prediction unit 205 is configured to input the target decoding feature to a classification and regression head module of the VIT detection model, to obtain a cable predicted position in the power cable image.

The VIT-based power cable position detection apparatus described above may be implemented in the form of a computer program that is operable on an electronic device such as that shown in fig. 3.

Referring to fig. 3, fig. 3 is a schematic block diagram of an electronic device according to an embodiment of the present application. The electronic device 300 may be a device such as a detector, in particular a manipulation device in the detector, or a terminal or a server having a communication connection with the manipulation device.

Referring to fig. 3, the electronic device 300 includes a processor 302, a memory, and a network interface 305, which are connected by a system bus 301, wherein the memory may include a non-volatile storage medium 303 and an internal memory 304.

The non-volatile storage medium 303 may store an operating system 3031 and a computer program 3032. The computer program 3032 includes program instructions that, when executed, cause the processor 302 to perform a VIT-based power cable location detection method.

The processor 302 is used to provide computing and control capabilities to support the operation of the overall electronic device 300.

The internal memory 304 provides an environment for the execution of a computer program 3032 in the non-volatile storage medium 303, which computer program 3032, when executed by the processor 302, causes the processor 302 to perform a VIT-based power cable position detection method.

The network interface 305 is used for network communication with other devices. Those skilled in the art will appreciate that the structure shown in fig. 3 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the electronic device 300 to which the present application is applied, and that a particular electronic device 300 may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

Wherein the processor 302 is configured to execute a computer program 3032 stored in a memory to implement the following steps:

acquiring a power cable image and a pre-trained VIT detection model;

It should be appreciated that in embodiments of the present application, the processor 302 may be a central processing unit (Central Processing Unit, CPU), the processor 302 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Those skilled in the art will appreciate that all or part of the flow in a method embodying the above described embodiments may be accomplished by computer programs instructing the relevant hardware. The computer program comprises program instructions, and the computer program can be stored in a storage medium, which is a computer readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present application also provides a storage medium. The storage medium may be a computer readable storage medium. The storage medium stores a computer program, wherein the computer program includes program instructions. The program instructions, when executed by the processor, cause the processor to perform the steps of:

acquiring a power cable image and a pre-trained VIT detection model;

The storage medium may be a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, or other various computer-readable storage media that can store program codes.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the application can be combined, divided and deleted according to actual needs. In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The integrated unit may be stored in a storage medium if implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing an electronic device (which may be a personal computer, a terminal, a network device, or the like) to perform all or part of the steps of the method described in the embodiments of the present application.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The power cable position detection method based on the VIT is characterized by comprising the following steps of:

acquiring a power cable image and a pre-trained VIT detection model;

2. The VIT-based power cable location detection method of claim 1, wherein prior to obtaining a pre-trained VIT detection model, the method further comprises:

acquiring cable RGB images under different environmental conditions by using an image acquisition device to construct an image set to be annotated, and acquiring an initial VIT model and a pre-constructed loss function;

3. The VIT-based power cable location detection method of claim 2, wherein labeling each image to be labeled in the set of images to be labeled comprises:

and labeling each image to be labeled in the image set to be labeled at a preset distance from the damper.

4. The VIT-based power cable location detection method of claim 2, wherein the detecting whether the loss function reaches convergence during training includes:

5. The VIT-based power cable position detection method of claim 1, wherein the feature extraction of the power cable image using the skeleton network of the VIT detection model to obtain target image features of the power cable image comprises:

scaling the power cable image to obtain a first characteristic;

6. The VIT-based power cable position detection method of claim 1, wherein the position encoding the target image feature to obtain a target position encoded vector includes:

7. The VIT-based power cable location detection method of claim 1, wherein the encoder structure includes a plurality of encoders end-to-end; the encoding the target image feature and the target position encoding vector by using the encoder structure of the VIT detection model, the obtaining the target encoding feature includes:

8. The VIT-based power cable location detection method of claim 1, wherein the decoder structure includes a plurality of decoders end-to-end; the decoding the target coding feature by using the decoder structure and the object queries of the VIT detection model, and obtaining a target decoding feature includes:

9. An electronic device, the electronic device comprising:

a memory storing at least one instruction; a kind of electronic device with high-pressure air-conditioning system

A processor executing instructions stored in the memory to implement the VIT-based power cable position detection method of any one of claims 1-8.

10. A computer-readable storage medium, characterized by: the computer-readable storage medium has stored therein at least one instruction that is executed by a processor in an electronic device to implement the VIT-based power cable position detection method of any one of claims 1-8.