CN114821368A

CN114821368A - Power defect detection method based on reinforcement learning and Transformer

Info

Publication number: CN114821368A
Application number: CN202210482372.7A
Authority: CN
Inventors: 李帷韬; 侯建平; 胡平路; 管树志; 杨盛世; 张雪松; 李奇越; 孙伟; 刘鑫; 常文婧; 李卫国; 王刘芳; 董翔宇; 黄杰
Original assignee: Super High Voltage Branch Of State Grid Anhui Electric Power Co ltd; Hefei University of Technology
Current assignee: Super High Voltage Branch Of State Grid Anhui Electric Power Co ltd; Hefei University of Technology
Priority date: 2022-05-05
Filing date: 2022-05-05
Publication date: 2022-07-29
Anticipated expiration: 2042-05-05
Also published as: CN114821368B

Abstract

The invention discloses a power defect identification method based on reinforcement learning and Transformer, which comprises the following steps: 1, acquiring an original data set by an unmanned aerial vehicle aerial photography, and generating a confrontation network augmentation data set through deep convolution; 2, extracting image features by using a reinforcement learning module to search a foreground region; and 3, compressing the feature vector of the background region through a Transformer module, further extracting features, and finally obtaining a final prediction result through a full connection layer. The invention realizes the detection of the electric power defect area by utilizing deep learning, thereby reducing the labor cost, being not influenced by external factors such as weather, background and the like, and improving the detection efficiency and the detection precision.

Description

Power defect detection method based on reinforcement learning and Transformer

Technical Field

The invention relates to the field of deep learning, in particular to a power defect detection method based on reinforcement learning and a Transformer.

Background

With the continuous increase of the power demand in China, the scale of the power transmission line is rapidly enlarged, and the power transmission line is exposed in a complex natural environment for a long time and is interfered by natural conditions such as rainwater corrosion, lightning stroke and the like, so that various faults are easy to occur, and the safe and stable operation of a power grid is influenced. Therefore, the high-efficiency inspection of the power transmission line is important for ensuring normal operation of production and life.

Many transmission lines erect in the region of rare people, and the topography is complicated, and the cost and the degree of difficulty that traditional manual work was patrolled and examined are all higher. Therefore, many areas adopt unmanned aerial vehicle to carry on image acquisition device's mode, gather transmission line's image and video data fast to the work load is patrolled and examined in significantly reducing. However, because the aerial images are large in quantity and complex in background, when workers adopt manual judgment, the efficiency is low, judgment errors are easy to occur, and the requirement of routing inspection is difficult to meet. Therefore, the research of the power transmission line detection fault detection method based on the aerial image of the unmanned aerial vehicle has important theoretical and practical values.

With the continuous development of unmanned aerial vehicle technology and computer technology, research on detection in the aspect of power transmission lines at home and abroad has made a certain progress, and the method can be divided into a traditional image processing method and a deep learning method according to the principle of an identification algorithm. Feature extraction in traditional image processing methods mainly relies on artificially designed extractors. The document [ frequency tuning-based insulator identification and positioning ] provides a frequency tuning-based insulator identification and positioning algorithm, and the insulator positioning in a complex background is realized by threshold segmentation in HSV color space and calculation of significance information of a target object by using a frequency tuning method. The document [ a contact network insulator identification method based on Canny edge feature points ] provides a contact network insulator identification method based on Canny edge feature points, and the intelligent identification of the contact network insulator is realized by extracting image edge features by using Canny and extracting feature points on an edge image by using an SURF algorithm. However, the traditional algorithm excessively depends on manually designing an extractor, the calculation amount is large, and the detection precision and speed can not meet the requirements of practical application.

With the continuous improvement of computer computing ability, deep learning is continuously applied in new fields, and gradually becomes mainstream due to good generalization and characterization ability. Currently, a model based on a combined significant region and a Fast-CNN is used for insulator characteristic detection research, so that the insulator working state detection effect in a power transmission line is improved; according to the insulator defect detection method of the lightweight YOLOV3, an improved lightweight network is constructed by a framework, and insulator positioning and fault detection are realized. In the methods, the power failure detection is performed by using the reinforcement learning model, so that the detection performance is greatly improved compared with the traditional method, but the following problems exist: 1) the data sets are relatively few, and the general performance of the trained model under different environments is poor; 2) a final prediction result is obtained through a deep network, and the information of a small target object cannot be reconstructed, so that the detection precision is influenced; 3) the prior frame needs to be set for detection, the performance is greatly influenced by the design of the prior frame, and meanwhile, the calculation related to the anchor window in training and detection occupies a large amount of calculation resources, so that the method cannot be well adapted to circuit inspection tasks in various different environments; 4) background-independent detection occupies a large amount of computing resources, and influences the final detection result.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to solve the problems of poor universality, excessive dependence on a priori frame, occupation of computing resources by irrelevant background, influence on precision and the like in different environments of the conventional power defect detection method, and provides a depth-based power defect detection method so as to realize intelligent power defect inspection, reduce labor cost and improve detection efficiency and detection accuracy.

The invention adopts the following technical scheme for solving the technical problems:

the invention discloses a power defect identification method based on reinforcement learning and a Transformer, which is characterized by comprising the following steps of:

step 1, collecting an electric power inspection aerial image set, using a depth convolution to generate a countermeasure network for data enhancement to obtain an expanded image data set, standardizing the expanded image data set, and labeling each standardized image with a target detection frame to obtain a training data set;

step 2, constructing a reinforcement learning module for screening foreground regions and background regions, inputting the training data set into the reinforcement learning module for training, and obtaining a foreground region characteristic vector set F of the ith image in the training data set _i,f And background region vector set F _i,b ；

Step 3, according to the foreground characteristic vector set F of the ith image _i,f And background region vector set F _i,b Using bilinear pooling layer to assemble F vector to the background region _i,b Performing information compression on the non-key area to obtain a feature vector set F with the number of M _i,c Then the foreground feature vector set F _i,f After the connection, a new vector combination F is obtained _i ^* ；

Step 4, combining the new vectors of the ith image F _i ^* Inputting the feature into a Transformer module for feature extraction to obtain a feature map T _i And then, processing the characteristic diagram by using a full connection layer, and outputting a prediction result, so that a loss function is constructed according to the prediction result and a real result, the loss function is used for training parameters of the Transformer module, and finally the trained Transformer module is obtained and used for identifying the power defect.

The method for identifying an electric power defect according to the present invention is also characterized in that the step 2 includes:

step 2.1, constructing a reinforcement learning module, comprising: a main network and a DQN network for extracting the characteristics of the DetNet; randomly initializing parameters of the reinforcement learning module;

the main network is a convolutional neural network constructed based on hole convolution and comprises X convolutional blocks; wherein, the x-th convolution block is formed by a convolution kernel with the size of c ₁ Having a common convolution layer, a convolution kernel size of c ₂ Has a hole convolution layer and a convolution kernel size of c ₂ Wherein, a batch normalization layer and a ReLu activation function are connected behind each convolution block;

the DQN network composition consists of a fully connected network;

2.2, setting the space of an experience pool D according to the number of pictures in the training data set, and initializing the experience pool D;

step 2.3, inputting the ith image with the target detection frame in the training data set into the extraction trunk network for feature extraction to obtain a feature map F with the size of C multiplied by n _i (ii) a C represents a characteristic diagram F _i N represents a feature map F _i Length or width of (d);

step 2.4, the characteristic diagram F is processed _i Dividing the motion space into n multiplied by n block unit areas with equal size, wherein each block unit area represents one motion selection, and obtaining a motion space marked as A _i ＝{a _i,1 ,a _i,2 ,,...,a _i,t ,...,a _i,n×n }，a _i,t An operation for selecting the t-th unit area in the ith image;

step 2.5, calculating the reward score r of the unit area selected by the t action in the ith image by using the formula (1) _i,t Thereby obtaining the characteristic diagram F _i Is rewarding function J _i ＝{j _i,1 ,j _i,2 ,,...,j _i,t ,...,j _i,n×n }；

In the formula (1), Δ _i,t Representing the number of target object pixels contained in the t unit area in the ith image; delta _i,all Representing the total number of pixels of the t-th unit area in the ith image;

step 2.6, setting the characteristic diagram F _i State space s of _i ；

Step 2.7, inputting the state space into the DQN network, obtaining the score of each action in the action space, selecting the K actions with the highest score by a greedy strategy, and selecting the actions from the state space s _i And K actions of it constitute a Q value table Q(s) _i ,a _i )；

And calculating K reward values obtained under the current K actions according to a reward function R, so as to obtainAnd K prize values and Q values table Q(s) _i ,a _i ) The formed ith tuple is stored in an experience pool D;

step 2.8, sequentially processing all images in the training data set according to the process from the step 2.3 to the step 2.7, thereby filling the experience pool D;

2.9, randomly selecting the G group in the experience pool D, and updating parameters in the primary reinforcement learning module by using a back propagation algorithm;

step 2.10, z is performed according to the procedure from step 2.3 to step 2.9 ₁ And storing the final reinforcement learning module after secondary updating, inputting the images in the training data set into the final reinforcement learning module, and outputting a final feature map F 'from the ith image' _i Selecting the feature map F' _i Taking the K unit areas with the highest medium reward value as a foreground area feature vector set F _i,f ＝{f _i,f,l And 1., K, with the remaining unit area as the background area vector set F _i,b ＝{f _i,b,r 1, ·, n × n-K }; wherein f is _i,f,l Feature vectors representing the ith foreground region in the ith image, f _i,b,r A feature vector representing the r-th background region in the i-th image.

The step 3 comprises the following steps:

step 3.1, obtaining an aggregation weight vector a corresponding to the r background region vector in the ith image by using the formula (2) _i,r ：

a _i,r ＝f _i,b,r W _a (2)

In the formula (2), W _a Representing a weight matrix with one dimension of C multiplied by M; a is _i,r ＝{a _i,r,m 1,. and M, wherein M represents a compressed vector dimension; a is _i,r,m Is an aggregate weight vector a _i,r The mth element in the ith image represents the mth aggregation weight corresponding to the characteristic vector of the mth background area in the ith image;

step 3.2, utilizing the formula (3) to perform the m-th aggregation weight a corresponding to the r-th background region feature vector in the i-th image _i,r,m Carrying out standardization processing to obtain the r-th background region feature vector pair in the ith imageCorresponding mth normalized polymerization weight a' _i,r,m ：

In the formula (3), a _i,r',m The mth aggregation weight corresponding to the characteristic vector of the mth background area in the ith image is shown,

the m-th aggregation weight sum of all the background region feature vectors in the i-th image is represented;

step 3.2, obtaining the r characteristic projection vector f 'in the i image by using the formula (4)' _i,r ：

f′ _i,r ＝f _i,r W _v (4)

In the formula (4), W _v Representing a weight matrix with one dimension of C multiplied by C;

step 3.3, obtaining the m-th compressed feature vector f in the ith image by using the formula (5) _i,m Thereby obtaining a compressed background region feature vector set F _i,c ＝{f _i,m ，|m＝1,...,M}：

In formula (5), a represents convolution between vectors.

The step 4 specifically comprises the steps of:

step 4.1, constructing a Transformer module, which comprises the following steps: a Transformer feature extraction network and an end-to-end detector; and randomly initializing parameters of the Transformer module;

the Transformer feature extraction network branch is Y Transformer blocks constructed based on Transformer-encode, wherein the Y Transformer block sequentially comprises a first normalization layer, a multi-head self-attention mechanism layer, a second normalization layer and a multi-layer perceptron, wherein the input of the first normalization layer is in jump connection with the output of the multi-head attention mechanism layer, and the input of the second normalization layer is in jump connection with the output of the multi-layer perceptron;

the end-to-end detector is comprised of a fully connected network;

step 4.2, collecting the new vector F of the ith image _i ^* Sending the data into the Transformer feature extraction network, and obtaining a feature map T with highly concentrated information after feature extraction of Y Transformer blocks in sequence _i Then the characteristic diagram T is used _i Inputting the data into the end-to-end detector for prediction to obtain a prediction result.

Compared with the prior art, the invention has the beneficial effects that:

1. the method applies the deep convolution to generate the confrontation network to data enhancement, generates the picture through the training deep learning network, improves the size of the data set, and improves the generalization performance of the model under different environments.

2. According to the method, the cavity convolution is used for feature extraction, on the premise that pooling loss information is avoided, the receptive field is enlarged, each convolution output contains information in a large range, and compared with common convolution, the sensing capability of small target objects and the positioning capability of large target objects are improved.

3. According to the method, the key area is extracted from the characteristic diagram by using reinforcement learning, and the characteristic diagram of the background area is compressed by adopting the bilinear layer, so that the calculation proportion of irrelevant background information in a transform network is reduced, the calculation speed is increased, and the influence of irrelevant information vectors on the final detection result is reduced.

4. The method finally and directly outputs the final detection result through a group of feedforward networks, avoids the dependence of detection precision on the design of the anchor frame when the anchor frame is adopted in the traditional method and the unbalance problem of the positive anchor frame and the negative anchor frame in the training process by using a prior frame, and improves the efficiency and the precision of calculation.

Drawings

FIG. 1 is a flow chart of the algorithm of the present invention.

Detailed Description

In this embodiment, referring to fig. 1, a method for identifying an electrical defect based on reinforcement learning and a Transformer is performed according to the following steps:

step 1, collecting an electric power inspection aerial image set, using a depth convolution to generate a countermeasure network for data enhancement to obtain an expanded image data set, standardizing the expanded image data set, and labeling each standardized image with a target detection frame to obtain a training data set; in this embodiment, there are 8192 pictures in the training data set;

step 2, constructing a reinforcement learning module for screening the foreground region and the background region, inputting the training data set into the reinforcement learning module for training, and obtaining a foreground region characteristic vector set F of the ith image in the training data set _i,f And background region vector set F _i,b ；

the main network is a convolutional neural network constructed based on the hole convolution and comprises X convolutional blocks; wherein, the x-th convolution block is formed by a convolution kernel with the size of c ₁ ×c ₁ Has a common convolution, with a convolution kernel size of c ₂ ×c ₂ Has a hole convolution layer and a convolution kernel size of c ₃ ×c ₃ Wherein, each convolution block is connected with a batch normalization layer and a ReLu activation function; in this embodiment, the size of X is 6, and the size of the xth convolution block is c ₁ ＝1、c ₂ 3 and c ₃ ＝1。

The DQN network consists of fully connected networks;

2.2, setting the space of an experience pool D according to the number of pictures in the training data set, and initializing the experience pool D; in this embodiment, the spatial size of the experience pool D is 8192.

Step 2.3, inputting the ith image with the target detection frame in the training data set into an extraction trunk network for feature extraction to obtain a feature map F with the size of C multiplied by n _i (ii) a C represents the number of channels of the feature map F, n represents the feature map F _i Length or width of (d); in this embodiment, the number C of feature map channels is 256, and the length n is 28;

step 2.4, converting the characteristic diagram F _i Dividing the motion space into n multiplied by n block unit areas with equal size, wherein each block unit area represents one motion selection, and obtaining a motion space marked as A _i ＝{a _i,1 ,a _i,2 ,,...,a _i,t ,...,a _i,n×n }，a _i,t An operation for selecting the t-th unit region in the ith picture; in this embodiment, the feature map is divided into 28 × 28 block unit areas;

step 2.5, calculating the reward score r of the unit area selected by the t-th action in the ith picture by using the formula (1) _i,t Thereby obtaining the reward function J of the characteristic diagram F _i ＝{j _i,1 ,j _i,2 ,,...,j _i,t ,...,j _i,n×n }；

In the formula (1), Δ _i,t Representing the number of target object pixels contained in the t unit area in the ith picture; delta _i,all Representing the total number of pixels of the t unit area in the ith picture;

step 2.6, setting a characteristic diagram F _i State space s of _i ；

Step 2.7, inputting the state space into the DQN network, obtaining the score of each action in the action space, selecting the K actions with the highest score by using a greedy strategy, and selecting the actions from the state space s _i And K actions of it constitute a Q value table Q(s) _i ,a _i )；

And calculating K rewards obtained under the current K actions according to a reward function RValues, and thus K prize values and Q values table Q(s) _i ,a _i ) The formed ith tuple is stored in an experience pool D; in this embodiment, the value of K is set to 260, i.e., 260 actions with the highest scores are selected by using a greedy strategy;

step 2.10, z is performed according to the procedure from step 2.3 to step 2.9 ₁ And storing the final reinforcement learning module after secondary updating, inputting the images in the training data set into the final reinforcement learning module, and outputting a final feature map F 'from the ith image' _i Selecting a feature map F' _i Taking the K unit areas with the highest medium reward value as a foreground area feature vector set F _i,f ＝{f _i,f,l And 1., K, with the remaining unit area as the background area vector set F _i,b ＝{f _i,b,r 1, ·, n × n-K }; wherein f is _i,f,l Feature vectors representing the ith foreground region in the ith image, f _i,b,r A feature vector representing the r-th background region in the i-th image. In this embodiment, the reinforcement learning module trains the iteration number z ₁ At 200, the foreground region feature vector set comprises 260 feature vectors, and the background region vector feature comprises 524 feature vectors;

step 3, according to the foreground characteristic vector set F of the ith image _i,f And background region vector set F _i,b Using bilinear pooling layer to set F of background region vectors _i,b Performing information compression on the non-key area to obtain a feature vector set F with the number of M _i,c Then, the foreground feature vector set F is obtained _i,f After the connection, a new vector combination F is obtained _i ^* ；

a _i,r ＝f _i,b,r W _a (2)

In the formula (2), W _a Representing a weight matrix with one dimension of C multiplied by M; a is _i,r ＝{a _i,r,m 1,. and M, wherein M represents a compressed vector dimension; a is _i,r,m Is an aggregate weight vector a _i,r The mth element in the ith image represents the mth aggregation weight corresponding to the characteristic vector of the mth background area in the ith image; the vector dimension M after compression in this embodiment is 60;

step 3.2, utilizing the formula (3) to perform the m-th aggregation weight a corresponding to the r-th background region feature vector in the i-th image _i,r,m Carrying out normalization processing to obtain an m < th > normalized aggregation weight a 'corresponding to the r < th > background region feature vector in the i < th > image' _i,r,m ：

the m-th aggregation weight sum of all the background region feature vectors in the i-th image is represented; thus changing the dimensionality and projecting the r vectors into the M vectors;

step 3.3, obtaining the r characteristic projection vector f 'in the i image by using the formula (4)' _i,r ：

f′ _i,r ＝f _i,r W _v (4)

In the formula (4), W _v Representing a weight matrix with one dimension of C multiplied by C; in this embodiment, W _v Is a weight matrix with dimension of 256 multiplied by 256;

step 3.4, obtaining the m-th compressed feature vector f in the ith image by using the formula (5) _i,m Thereby obtaining a compressed background region feature vector set F _i,c ＝{f _i,m ，|m＝1,...,M}：

In formula (5), a represents convolution between vectors.

Step 4, combining the new vectors of the ith image F _i ^* Inputting the feature into a Transformer module for feature extraction to obtain a feature map T _i And then, processing the characteristic diagram by using the full connection layer, and outputting a prediction result, so that a loss function is constructed according to the prediction result and a real result, the loss function is used for training parameters of the Transformer module, and finally the trained Transformer module is obtained and used for identifying the power defect.

Step 4.1, constructing a Transformer module, comprising the following steps: a Transformer feature extraction network and an end-to-end detector; randomly initializing parameters of a Transformer module;

the Transformer feature extraction network branch is Y Transformer blocks constructed based on Transformer-encode, wherein the Y Transformer block sequentially consists of a first normalization layer, a multi-head self-attention mechanism layer, a second normalization layer and a multi-layer perceptron, wherein the input of the first normalization layer is in jump connection with the output of the multi-head attention mechanism layer, and the input of the second normalization layer is in jump connection with the output of the multi-layer perceptron; in this embodiment, the number Y of transform blocks is 6;

the end-to-end detector consists of a fully connected network; in this embodiment, the fully connected network is formed of three layers;

step 4.2, collecting the new vector F of the ith image _i ^* Sending the data into a Transformer feature extraction network, and sequentially extracting the features of Y Transformer blocks to obtain a feature map T with highly concentrated information _i Then, the feature map T is used _i Inputting the data into an end-to-end detector for prediction to obtain a prediction result.

Claims

1. A reinforcement learning and Transformer-based electric power defect identification method is characterized by comprising the following steps:

2. The power defect identification method according to claim 1, wherein the step 2 comprises:

the main network is a convolutional neural network constructed based on hole convolution and comprises X convolutional blocks; wherein, the x-th convolution block is sequentially formed by the sizes of convolution kernelsIs c ₁ Having a common convolution layer, a convolution kernel size of c ₂ Has a hole convolution layer and a convolution kernel size of c ₂ Wherein, a batch normalization layer and a ReLu activation function are connected behind each convolution block;

the DQN network composition consists of a fully connected network;

step 2.6, setting the characteristic diagram F _i State space s of _i ；

Step 2.7, input the state space intoIn the DQN network, the score of each action in the action space is obtained, K actions with the highest score are selected by a greedy strategy, and the state space s _i And K actions of it constitute a Q value table Q(s) _i ,a _i )；

And calculating K reward values obtained under the current K actions according to the reward function R, thereby obtaining K reward values and a Q value table Q(s) _i ,a _i ) The formed ith tuple is stored in an experience pool D;

3. The power defect identification method according to claim 1, wherein the step 3 comprises:

a _i,r ＝f _i,b,r W _a (2)

In the formula (2), W _a Representing a weight matrix with one dimension of C multiplied by M; a is _i,r ＝{a _i,r,m |m＝1,...,M}，MRepresenting the compressed vector dimensions; a is _i,r,m Is an aggregate weight vector a _i,r The mth element in the ith image represents the mth aggregation weight corresponding to the characteristic vector of the mth background area in the ith image;

f′ _i,r ＝f _i,r W _v (4)

In formula (5), a represents convolution between vectors.

4. The method for identifying the power defect according to claim 1, wherein the step 4 specifically comprises the steps of:

step 4.1, constructing a Transformer module, comprising the following steps: a Transformer feature extraction network and an end-to-end detector; and randomly initializing parameters of the Transformer module;

the end-to-end detector is comprised of a fully connected network;