CN114528755A

CN114528755A - Power equipment fault detection model based on attention mechanism combined with GRU

Info

Publication number: CN114528755A
Application number: CN202210084475.8A
Authority: CN
Inventors: 张晓华; 吕志瑞; 武宇平; 黄彬; 孙云生; 杨静宇; 卢毅; 马鑫晟; 张连超; 李世杰
Original assignee: Beijing Kedong Electric Power Control System Co Ltd; State Grid Jibei Electric Power Co Ltd; Electric Power Research Institute of State Grid Jibei Electric Power Co Ltd
Current assignee: Beijing Kedong Electric Power Control System Co Ltd; State Grid Jibei Electric Power Co Ltd; Electric Power Research Institute of State Grid Jibei Electric Power Co Ltd
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2022-05-24

Abstract

The invention discloses an attention mechanism combined GRU-based power equipment fault detection model, which comprises a classification neural network model, wherein training data of the classification neural network model is derived from a preprocessing model, the preprocessing model converts input unbalanced power equipment data into balanced data and carries out embedded representation, and intermediate data are output: based on the historical state sequence of the power equipment representation, the embedded representation of the tag data and the embedded representation of the power equipment portrait characteristic; extracting time and space characteristics of the power equipment from the historical state sequence through the GRU module; extracting state sequence features from the output of the GRU module by an attention mechanism module; extracting, by a graph attention mechanism module, environmental information of the power device from the embedded representation of the power device portrait feature; and performing alignment fusion on the state sequence characteristics, the label data embedded representation and the environment information to serve as training data input of the classification neural network.

Description

Power equipment fault detection model based on attention mechanism combined with GRU

Technical Field

The invention belongs to the technical field of power equipment fault detection, relates to a power grid fault detection method, and particularly relates to a power equipment fault detection model based on an attention mechanism combined with GRUs.

Background

In the era of rapid development of science and technology and continuous optimization of economic structure, power problems face significant challenges. With the increase of the number of power consumers and enterprises, especially in areas with large industrial development, the requirement for power supply is higher, and when power supply equipment failure occurs in these areas, industrial equipment can be prevented from operating for a long time, and a series of serious consequences can be caused, so that automatic failure detection of power equipment plays an important role in a power supply system. The existing fault detection method based on the traditional power equipment is complex and inefficient, and meanwhile, the existing model cannot make full use of unbalanced power fault data, so that the actual value of the model is greatly reduced.

Therefore, how to provide a training model and a method for a power equipment fault detection model to achieve better performance of optimization, recognition and classification tasks and improve the accuracy of model detection is a technical problem to be solved urgently by those skilled in the art.

Disclosure of Invention

In view of the defects of the prior art, the invention aims to provide a power equipment fault detection model based on an attention mechanism set GRU, which is reasonable in design, sufficient in data utilization and high in detection accuracy.

In order to achieve the above object, the present invention provides an attention mechanism-based power equipment failure detection model combining GRU, the power equipment failure detection model comprising a classification neural network model whose training data is derived from a preprocessing model, the preprocessing model comprising an up-sampling module, a word embedding representation learning module, a GRU module, an attention mechanism module and a graph attention mechanism module,

the up-sampling module is used for converting input unbalanced power equipment data into balanced data;

the word embedding representation learning module is used for embedding and representing the balance data and outputting an embedded representation based on a historical state sequence represented by the power equipment, a tag data embedding representation and a power equipment portrait characteristic;

the GRU module is used for extracting time and space characteristics of the electric power equipment from a history state sequence which is output by the word embedding representation learning module and is based on the electric power equipment representation;

the attention mechanism module is used for extracting state sequence features from the time and space features of the equipment;

the graph attention mechanism module is used for extracting environment information of the power equipment from the embedded representation of the power equipment portrait characteristics;

and performing alignment fusion on the state sequence features, the label data embedded representation and the environment information to serve as training data input of the classification neural network.

Further, the up-sampling algorithm adopted by the up-sampling module is an SC-SMOTE up-sampling algorithm.

Further, the SC-SMOTE upsampling algorithm specifically includes:

step 21: traversing input power equipment data set data, and determining a majority seed sample and a minority seed sample;

step 22: according to the seed sample information, simultaneously performing upsampling on the majority class and the minority class, and calculating the number of samples generated by each minority type of seed samples;

step 23: after the number of samples generated by each few types of seed samples is obtained, carrying out linear interpolation to obtain a final new sample, and combining the newly generated sample and the original seed sample together to generate a balanced sample data set;

step 24: and carrying out embedded representation on the data in the generated balance sample data set.

Further, the step 21 includes: traversing the power equipment data set data, and determining a neighbor sample set D of the sample x by using a KNN algorithm_nIn neighbor set D_nIn (2), samples of the same class as sample x are set D_sameThe set of samples of the different class from sample x is called D_other(ii) a Comparison D_sameNumber of samples and D_otherAccording to the formula:

and judging whether the sample x is a seed sample, and adding a seed sample label S to the original data set.

Further, the calculation of the number of generated samples in step 22 includes the following formula:

label_diff_j＝Nma_j-N_j

wherein, label _ diff_jRepresenting majority and minority classes C in the original dataset_jThe difference in the number of samples of (1); n is a radical of_jIndicates belonging to class C_jThe number of samples of (a); d_{s_j}Indicates belonging to class C_jThe set of seed samples of (a); rs (Rs)_jRepresenting majority seed samples and class C_jThe proportion of the seed sample of (a); n is a radical of_gjRepresents each of the categories C_jThe number of new samples generated by averaging the seed samples; label _ diff_j/|D_{s_j}I represents class C_jThe number of samples that need to be generated in order to balance the difference in the number of raw data.

Further, the step 23 includes the steps of: after the number of samples generated by the seed samples is obtained, updating cluster center coordinates while iteratively dividing the samples each time by adopting a K-means algorithm according to the Euclidean distance between the cluster center and the sampled samples; wherein the hyper-parameter K of the K-means algorithm_cRepresenting the number of clusters of classes, the hyperparameter k_cThe value of (d) depends on the ratio of the number of majority classes to the number of minority classes in the dataset, and is formulated as:

after clustering the data set by adopting a K-means algorithm, marking a cluster label C of the cluster for each sample, and updating the data set as follows:

further, the data processing method for the newly generated sample in step 23 includes: screening out samples of the same category in each category cluster to form a sample set D_cEach sample contains a feature set F ═ F₁,f₂,…,f_pAnd then according to different feature types, executing: when the characteristics are discrete characteristics, field selection is carried out according to probability distribution of different fields in the data generation process; when the characteristic is a continuous characteristic, in the data generation process, the [ min, max ] of the characteristic value is taken]And randomly selecting data in the interval as a generated value, wherein max and min are respectively the maximum value and the minimum value of the characteristic value.

Further, the method for obtaining the final new sample by linear interpolation in step 23 is as follows:

for each seed sample x_iCorresponding category y_iClass c of_iThere is a new number of samples N that need to be generated_giEach time a new sample is generated, according to N_giAnd distribution FD [ c ] of each feature of the cluster in which it is located_i][y_i]First, auxiliary sample x is generated_tempThen linear interpolation is carried out to obtain the final generated sample x_new；

Wherein the auxiliary sample x_tempThe construction of (c) needs to satisfy three rules:

temporary sample x_tempAnd sample x_iLabels y belonging to the same category_i；

Temporary sample x_tempAnd sample x_iBelong to the same cluster c_i；

Temporary sample x_tempAnd sample x_iHaving the same characteristics, but of eachThe characteristic value is based on the class c_iCharacteristic distribution of (F c)_i][y_i]Obtaining by random sampling;

obtaining a temporary sample x_tempThen, a new sample x is obtained by means of linear interpolation_new：

x_temp＝[f₁,f₂,…,f_p],f_p＝Random(FD[c_i][y_i][p])

X_new＝x+Random(0,1)×(x_temp-x)

Cycling seed samples for N_gjAfter the secondary sample generation operation, obtaining a group of generated samples based on the seed sample, wherein the generated samples and the seed sample belong to the same category; after each seed sample finishes the sample generation, the obtained generated sample set D_gMerging with the original data set D to obtain the final required balance data set D_balance。

Further, the data output result of the classification neural network model is represented as:

wherein, y_pred∈{0,1,2}，O_bIs a vector representation of the state sequence features; e'_pIs a state feature vector representation of the power equipment; e.g. of the type_aFor vector representation of predicted target, W_deep,b_deepAre output layer parameters.

Further, the vector of the state sequence features represents O_bThe GRU module and the attention mechanism module are used for calculation;

the data processing procedure and result of the GRU module are expressed as:

r_t＝σ(W^ri_t+U^rh_t-1+b^r)

z_t＝σ(W^zi_t+U^zh_t-1+b^z)

wherein σ denotes a Sigmoid function, which indicates a hadamard product,

wherein n is_hidRepresenting GRU network hidden layer size, k representing embedded vector size, i_tRepresenting the input of the GRU, representing the t-th vector representation in the history state sequence, i.e. i_t＝e_b[t](ii) a The output value h of the GRU module_tThe t hidden state is represented and is a potential expression form of the past state of the power equipment;

the data processing procedure and the result of the attention mechanism module are expressed as follows:

wherein, a_tAn attention score representing an attention distribution calculation of the attention mechanism module; f [.]Representing the attention scoring function.

Further, the state feature vector of the power device represents e'_pObtained through calculation of the graph attention machine module; the data processing results of the graph attention mechanism module are expressed as:

e′_p＝[e_p,e′_p1,e′_p2,…,e′_pn,X]

wherein e is_pRepresenting an embedded vector of features for the power device; e'_p1,e′_p2,…,e′_pnFor embedding the vector as e_pThe embedded vector of the relevant power device node of the power device of (1);

x is a coefficient set.

The invention realizes the following beneficial effects:

1. the present invention uses an attention-based mechanism in conjunction with a timing feature capture technique of the GRU. Aiming at the problem that time sequence feature information of input data cannot be fully utilized, the invention uses GRU in combination with an attention mechanism to improve feature extraction capability of input samples on time sequences and space structures, further captures more equipment state features, and then inputs the obtained power equipment features into a neural network to predict the fault condition of power equipment. This effectively solves the problem of inefficient utilization of the input power equipment sample information on the one hand. The invention is different from the traditional power equipment fault detection method in that the GRU network is used for simultaneously extracting the characteristic information of the input sample data on time and space, and an attention mechanism is used for sensing the state sequence characteristics of the input sample. By carrying out multi-dimensional capture on input sample characteristics, high-quality characteristic input is provided for classification of a downstream neural network model, and finally the model can better identify and detect the state of input power grid equipment.

2. When the method and the device are used for extracting the characteristics of the single power equipment node, the influence of the information of the power equipment around the single node on the single node is also considered besides the state characteristic time sequence information and the space information of the single node. According to the method, the surrounding related equipment information of the single equipment is captured by using an image attention mechanism, namely, the environment information is fused, so that the characteristic dimension of the single equipment is improved, more characteristic information is provided for the classification task of a downstream neural network, and the detection capability of the model on the fault equipment is improved.

3. The invention adopts the data up-sampling technology based on SC-SMOTE, which can well solve the problem caused by unbalanced samples and can effectively relieve the unbalanced condition of the power equipment samples.

4. According to the invention, the SC-SMOTE and the GRU network based on the attention mechanism are adopted, and the automatic accurate detection capability of the power equipment fault in the power grid power supply scene is realized by multi-dimensional extraction of the input power equipment sample characteristics.

Drawings

FIG. 1 is a schematic diagram of the SC-SMOTE based upsampling technique of the present invention to obtain balanced sample data;

FIG. 2 is a schematic flow chart of the process of feature acquisition in time and space for input samples based on an attention mechanism in combination with GRUs according to the present invention;

FIG. 3 is a diagram of a related information capturing network framework for obtaining environmental assistance information of a current device node based on a graph attention force mechanism according to the present invention;

fig. 4 is an overall frame diagram of the present invention.

Detailed Description

To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures.

The invention will now be further described with reference to the accompanying drawings and detailed description.

The invention provides a power equipment fault detection model based on an attention mechanism combined with a GRU. The power equipment fault detection model comprises a classification neural network model, training data of the classification neural network model is derived from a preprocessing model, the preprocessing model comprises an up-sampling module, a word embedding representation learning module, a GRU module, an attention mechanism module and a graph attention mechanism module, and the up-sampling module is used for converting input unbalanced power equipment data into balanced data; the word embedding representation learning module is used for embedding and representing the balance data and outputting a historical state sequence based on the power equipment representation, a label data embedding representation, and a state (power equipment portrait characteristic) embedding representation of the power equipment and the corresponding related equipment; the GRU module is used for extracting time and space characteristics of the electric power equipment from a history state sequence which is output by the word embedding representation learning module and is based on the electric power equipment representation; the attention mechanism module is used for extracting state sequence features from the time and space features of the equipment; the graph attention mechanism module is used for extracting environment information of the power equipment from the embedded representation of the power equipment portrait characteristics; and performing alignment fusion on the state sequence features, the label data embedded representation and the environment information to serve as training data input of the classification neural network.

Specifically, the construction and training of the power equipment fault detection model based on the attention mechanism and the GRU comprise the following steps:

step 1, inputting a power equipment data set, wherein the power equipment data set comprises a power equipment entity state and a corresponding label; wherein, power equipment entity state includes the gaseous composition content condition in the transformer oil, the transformer partial discharge condition, equipment contact surface temperature condition, various state data such as the condition that the internal element wets, and power equipment's label is power equipment like transformer fault type, includes: insulation deterioration, abnormal vibration, and the like.

Step 2, converting input unbalanced power equipment data into balanced data by utilizing an up-sampling algorithm such as SC-SMOTE (single carrier-synchronous transfer mode), and then carrying out embedded representation (Embedding);

the specific steps of the step 2 comprise:

taking the entity state and the label of the electric power equipment contained in the data set of the electric power equipment in the step 1 as the input of the SC-SMOTE up-sampling algorithm, taking the input as seed data, generating a new sample according to the corresponding characteristic distribution matrix, then obtaining a final new sample through linear interpolation, merging the newly generated sample and the original seed sample together to generate a balanced sample data set, and then embedding and representing the data:

(1) obtaining a power equipment data set, wherein the power equipment data set comprises equipment entity states and corresponding labels;

(2) to power equipmentThe dataset data is traversed and a Neighbor sample set D of samples x is determined using a KNN algorithm (K-Nearest Neighbor, Neighbor algorithm)_nIn neighbor set D_nIn (2), there are samples of different classes, and samples of the same class as sample x are set D_sameThe set of samples of the different class from sample x is called D_other. Comparison D_sameNumber of samples and D_otherAccording to formula 2.1, whether the sample x is a seed sample is judged, and a seed sample label S is added to the original data set.

(3) And according to the seed sample information, the majority class and the minority class are simultaneously subjected to upsampling. For most classes in the seed samples, the sampling rate is 100%, i.e. a majority class seed sample generates a new sample. The sampling rate of the few types of seed samples is determined according to the sample proportion of the original data and the sample proportion of the seed samples.

To compensate for the sample difference between the original data set, the sample number difference between the majority class and the minority class in the original data set, label _ diff, needs to be calculated_j：

label_diff_j＝N_maj-N_j (2.2)

Wherein, N_majRepresenting the number of samples in a plurality of classes; n is a radical of_jIndicates belonging to class C_jThe number of samples.

In the seed sample set, the majority type of seeds is more than the minority type of seeds, and the sampling rate of the majority type of seeds is 100%. In order to compensate the quantity difference of the seed samples, the proportion R of the majority type seeds and the minority type seeds needs to be calculated_sj：

Wherein D is_{s_maj}A set of seed samples representing a plurality of classes; d_{s_j}Indicates belonging to a categoryC_jThe set of seed samples.

Calculate each class C_jIs generated by averaging the seed samples to generate a new number of samples N_gj：

Wherein R is_sjThe number of samples that each seed sample needs to generate in order to balance the difference in the number of seed samples is shown. label _ diff_j/|D_{s_j}And | represents the number of samples that each seed sample needs to generate in order to balance the difference in the number of raw data.

(4) After the number of samples generated by each few types of seed samples is obtained, updating the cluster center coordinates while iteratively dividing the samples each time by adopting a K-means algorithm (K-means clustering algorithm) according to the Euclidean distance between the cluster center and the sampled samples. Wherein the hyper-parameter K of the K-means algorithm_cRepresenting the number of clusters of classes, in the SC-SMOTE algorithm, the hyperparameter k_cThe value of (d) depends on the ratio of the number of majority classes to minority classes in the dataset, expressed as:

clustering the data set according to a general K-means algorithm, marking a cluster label C of the cluster for each sample, and updating the data set as follows:

(5) screening out samples of the same category in each category cluster to form a sample set D_cEach sample contains a feature set F ═ F₁,f₂,…,f_pAnd performing corresponding processing according to different feature types.

For discrete features, such as "abnormal sound", "machine vibration abnormality", etc. The selection of the discrete features cannot be randomly selected from all the fields, and needs to be determined according to the occurrence frequency of different fields to ensure that the feature distribution of the generated sample and the finally obtained balanced data set is not changed.

For continuous features, such as "temperature data of the device itself", etc. During data generation, the continuous characteristics need to be considered in [ min, max ]]The value is taken in the interval, so the maximum value and the minimum value of the characteristic value need to be calculated, and the data are generated in [ min, max ]]And randomly selecting data as a generation value in the interval. To K_cP characteristics of L different classes in each class cluster, and the calculation dimension is (K)_c× L × p × 2).

(6) For each seed sample x_iCorresponding category y_iClass c of_iThere is a new number of samples N that need to be generated_giEach time a new sample is generated, according to N_giAnd distribution FD [ c ] of each feature of the cluster in which it is located_i][y_i]First, auxiliary sample x is generated_tempThen linear interpolation is carried out to obtain the final generated sample x_new。

The SC-SMOTE algorithm firstly constructs an auxiliary sample x according to the feature distribution_temp. Auxiliary sample x_tempThree rules need to be satisfied:

Temporary sample x_tempAnd sample x_iBelong to the same cluster c_i；

Temporary sample x_tempAnd sample x_iHaving the same characteristics, but the characteristic values of the respective characteristics are according to the class c_iCharacteristic distribution of (F c)_i][y_i]Obtaining by random sampling;

obtaining a temporary sample x_tempThen, a new sample x can be obtained by linear interpolation_new：

x_temp＝[f₁,f₂,…,f_p],f_p＝Random(FD[c_i][y_i][p]) (2.6)

X_new＝x+Random(0,1)×(x_temp-x) (2.7)

Cycling seed samples for N_gjAfter the sub-sample generating operation, a group of generated samples based on the seed sample are obtained, and the generated samples and the seed sample belong to the same category. After each seed sample finishes the sample generation, the obtained generated sample set D_gCombined with the original data set D to obtain the final required balanced data set D_balance. And in the balanced data set, the proportion of the majority classes and the minority classes is recovered to be normal, and the integral sample number is also expanded.

(7) For the finally obtained samples, the data format is defined as M × N, where M is the number of samples, representing descriptions of different power devices. N is the number of features, including device temperature, device image features, device parameter features, and context features. In feature processing, it is common practice to discretize continuous features. The discrete features can make the data matrix extremely sparse after being coded, and if the data matrix is not effectively processed, the parameter quantity of the subsequent modeling process can be increased greatly. The main function of the data embedding layer is to compress and represent the sparse vectors after the one-hot coding. The dimensionality of the data vector passing through the embedded layer is remarkably reduced, and the characteristic information is mainly represented in a numerical form. Suppose that the feature vector is represented as x after being subjected to one-hot coding₁；x₂；…；x_n]Where n is the number of feature fields, x_iIs a one-hot code representation of the feature field i. The size of the embedding layer matrix V is n × k, k being the size of the embedding layer vector.

After passing through the embedding layer, the sparse vectors will be encoded into dense vectors of equal length, and the output of the embedding layer is set as E, as shown in equation 2.8.

E＝[e₁,e₂,…,e_n]＝[v₁x₁,v₂x₂,…,v_nx_n] (2.8)

Wherein e is_iRepresenting a feature domain vector. For single valued features, each x_iOf which only one bit is 1,the feature domain vector represents a feature vector. For multiple features, e at this time_iWith a plurality of vectors. Finally, the embedded representation of the data set is completed.

And 3, defining a preprocessing model based on an attention mechanism and a GRU network on the basis of the sample embedded representation obtained in the step 2. The preprocessing model comprises a module for capturing state sequence characteristics of a single device in two dimensions of space and time and a module for acquiring auxiliary information of related devices around the single device.

The specific steps of the step 3 comprise:

(1) a state trend capture layer (GRU module) is defined. After the behavior sequence data are expressed in an embedding mode, the sequential relation of the behavior sequence is modeled by using a GRU network, and the sequential relation is shown in a formula 2.18.

Wherein σ denotes a Sigmoid function, which indicates a hadamard product,

wherein n is_hidRepresenting GRU network hidden layer size, k representing embedded vector size, i_tRepresenting the input of the GRU, representing the t-th vector representation in the sequence of behaviors, i.e. i_t＝e_b[t]Output value h of the network_tRepresenting the t-th hidden state is a potential expression of the past state of the power equipment. The main role of the state trend capture layer is to provide the temporal characteristics of the interest representation.

(2) A critical state awareness layer (attention mechanism module) is defined. The association of the current power equipment with the power equipment state at different time points in the history sequence is obtained by using an attention mechanism, and the process is measured by similarity, and can be regarded as a process of perceptual evolution, as shown in formula 2.19.

Wherein e is_aIs the target vector, F.]An attention scoring function is represented, where the calculation of the attention distribution is performed using a bilinear approach. The state sequence characteristic representation O at the moment can be obtained by multiplying the hidden state of the power equipment at different positions by the attention score_b. The key role of the key state perception layer is to provide local characteristics of the trend representation. The capturing of state sequence characteristics of a single device in two dimensions of space and time is completed.

(3) A graph attention mechanism layer (graph attention mechanism module) is defined, and a modeling mode of fully utilizing environmental characteristics is required while feature extraction, namely modeling of a behavior sequence, is ensured. An embedded vector of a portrait feature of a certain power equipment is e_pThe embedded vector of its associated power device node is e'_p1,e′_p2,…,e′_pnA new embedding vector e 'is then generated for each power device node using a graph attention mechanism'_pThe formula is shown in 2.20.

e′_p＝[e_p,e′_p1,e′_p2,…,e′_pn,X] (2.20)

Wherein the content of the first and second substances,

x is a coefficient set, and graph attention is carried out on the feature vector of each power device to obtain an output vector e'_pAnd finishing acquiring the auxiliary information of the peripheral related equipment for the single equipment.

(4) And (3) fusing the embedded representation of the features and the embedded representation of the labels obtained in the steps (2) and (3) as the input of the classification neural network of the power equipment fault detection model, wherein the final result is shown as a formula 2.21.

Wherein, y_pred∈{0,1,2}，e_aFor vector representation of the prediction target, W_deep,b_deepFor outputting layer parameters, the combination form among high-order features can be better captured by stacking multiple layers. And finally, constructing a power equipment fault detection model.

And 4, firstly, according to the embedded expression generated in the step 2, the embedded expression is used as the input of the preprocessing model obtained in the step 3, then the output of the preprocessing model is used as the input of the power equipment fault detection model, and finally the power equipment fault detection model is trained and generated.

The specific steps of the step 4 comprise:

(1) the network architecture adopted in the invention is realized based on an attention mechanism and a GRU network. Firstly, generating a corresponding embedded representation for the constructed balance sample based on the embedded layer in the step 2; secondly, defining a GRU module based on an attention mechanism, wherein the GRU module combines the attention mechanism and the GRU to capture time and space key features (defined as state sequence features) of the equipment nodes, and combines the attention mechanism to enable a single power equipment node to obtain auxiliary information of related power equipment, and fusing the generated features through vector connection. Finally, the attention mechanism defined in step 3 is used in combination with the network of GRUs, on one hand, the capturing capability of the state sequence features is learned, and on the other hand, the fault detection of the power equipment is realized through the learning of the features, and the network architecture is shown in fig. 4.

(2) The number of iterations of training, epochs, is set, starting with epochs equal to 1.

(3) Obtaining a prediction of the input embedded representation by obtaining a dataset sample embedded representation in step 2) and then embedding the data in a network representing a batch input attention mechanism in combination with GRUs.

(4) The loss function minimization of the estimated value and the true tag value is calculated.

(5) And (4) repeating the steps in (3) and (4) within the value range defined by the epochs, and finally training a power equipment fault detection model for performing data preprocessing by combining the attention mechanism with the GRU.

In an application system for classification problems and detection problems, the main concern is the feature extraction capability, and the concern is whether the features can be sufficiently mined and utilized. The innovation of the method is mainly based on the combination of GRU and an attention mechanism, so that on one hand, characteristics on time and space can be obtained, on the other hand, the attention mechanism can be used for utilizing and excavating state sequence characteristics, and meanwhile, characteristic noise can be eliminated to a certain extent; besides, in the invention, besides considering the state feature extraction of the power equipment, a graph attention mechanism is also used for obtaining auxiliary information which can be generated by the relevant equipment. The characteristics of the power equipment can be fully extracted and utilized based on the two aspects. The network model can better learn characteristics, and the characteristics are used for realizing more accurate fault detection of the power equipment. In order to solve the problems, many methods can choose to make some improvements in the directions of deepening the network depth, multi-mode fusion and the like. The method provided by the method is different from the prior art, and mainly realizes characteristic multi-dimensional mining of input data by combining an attention mechanism and a GRU technology, so that more characteristic information is obtained, and the fault detection capability of a network model is improved.

The design of the method is based on the attention mechanism and combined with the GRU network to fully mine the characteristics of the power equipment, and the method can better serve the classification detection task of the downstream neural network. The attention mechanism is combined with a GRU module, a multi-head attention mechanism and a GRU network model are used, firstly, a GRU network is used for extracting time and space characteristics of the power equipment, then, the attention mechanism is used for capturing state sequence characteristics, and the characteristic excavation of the power equipment is completed; in addition, in a graph attention mechanism model, a graph attention mechanism is used, auxiliary information of relevant equipment nodes is fully mined, and the environmental information is fully mined; and combining the information with the embedded representation of the label to perform attention alignment fusion, generating a vector with the same size as the state code input into the downstream neural network, outputting the vector as the input of the downstream neural network, performing countermeasure training with the label, and finally generating a power equipment fault detection model.

Based on the improvement, the power equipment fault detection model based on the attention mechanism combined with the GRU is realized. The method can effectively improve the accuracy of the fault detection of the power equipment.

The working principle of the invention is as follows:

the method comprises the steps of firstly carrying out SC-SMOTE up-sampling on a power grid power equipment sample to generate balance sample data, then carrying out embedded representation on an input sample by utilizing an embedded layer, then using the generated embedded representation as an attention mechanism combined with GRU network model input, firstly fully mining the input power equipment characteristics by utilizing the attention mechanism combined with GRU module and graph attention mechanism model, then carrying out attention mechanism-based fusion on the characteristics generated by each module, finally using the characteristics as the input of a downstream classification neural network, outputting and carrying out countermeasure training with a defined label, and finally completing model training to generate a model capable of accurately detecting the power equipment fault.

It should be emphasized that the embodiments described herein are illustrative and not restrictive, and thus the present invention includes, but is not limited to, the embodiments described in this detailed description, as well as other embodiments that can be derived by one skilled in the art from the teachings herein, and are within the scope of the present invention.

Claims

1. A power equipment fault detection model based on an attention mechanism and combined with GRU is characterized by comprising a classification neural network model, wherein training data of the classification neural network model is derived from a preprocessing model, the preprocessing model comprises an up-sampling module, a word embedding representation learning module, a GRU module, an attention mechanism module and a graph attention mechanism module,

the word embedding representation learning module is used for embedding and representing the balance data and outputting an embedding representation based on a history state sequence, a tag data embedding representation and power equipment portrait characteristics of power equipment representation;

2. The power device fault detection model of claim 1, wherein the upsampling module employs an upsampling algorithm that is an SC-SMOTE upsampling algorithm; the SC-SMOTE upsampling algorithm specifically comprises:

3. The power equipment fault detection model of claim 2, wherein said step 21 comprises: traversing the power equipment data set data, and determining a neighbor sample set D of the sample x by using a KNN algorithm_nIn neighbor set D_nIn (2), samples of the same class as sample x are set D_sameThe set of samples of the different class from sample x is called D_other(ii) a Comparison D_sameNumber of samples and D_otherAccording to the formula:

4. The power equipment fault detection model of claim 2, wherein the calculation of the number of generated samples of step 22 comprises the following equation:

label_diff_j＝N_maj-N_j

wherein, label _ diff_jRepresenting majority and minority classes C in the original dataset_jThe sample number difference of (2); n is a radical of_jIndicates belonging to class C_jThe number of samples of (a); d_{s_j}Indicates belonging to class C_jThe set of seed samples of (a); r_sjRepresenting majority seed samples and class C_jThe proportion of the seed sample of (a); n is a radical of_gjRepresents each of the categories C_jThe number of new samples generated by averaging the seed samples; label _ diff_j/|D_{s_j}I represents class C_jThe number of samples that need to be generated in order to balance the difference in the number of raw data.

5. The power equipment fault detection model of claim 2, characterized in that said step 23 comprises the steps of: after the number of samples generated by the seed samples is obtained, the K-means algorithm is adopted according to the clusterUpdating cluster-like center coordinates while iteratively dividing samples each time according to Euclidean distances between the center and the sampling samples; wherein the hyper-parameter K of the K-means algorithm_cRepresenting the number of clusters of classes, the hyperparameter k_cThe value of (d) depends on the ratio of the number of majority classes to the number of minority classes in the dataset, and is formulated as:

6. the power equipment fault detection model of claim 5, wherein the data processing method of the newly generated samples in the step 23 is as follows: screening out samples of the same category in each category cluster to form a sample set D_cEach sample contains a feature set F ═ F₁，f₂，...，f_pAnd then according to different feature types, executing: when the characteristics are discrete characteristics, field selection is carried out according to probability distribution of different fields in the data generation process; when the characteristic is a continuous characteristic, in the data generation process, the [ min, max ] of the characteristic value is taken]And randomly selecting data in the interval as a generated value, wherein max and min are respectively the maximum value and the minimum value of the characteristic value.

7. The power equipment fault detection model of claim 5, wherein the linear interpolation in step 23 is to obtain the final new sample by:

for each seed sample x_iCorresponding category y_iClass c of_iThere is a new number of samples N that need to be generated_giEach time, a new sample is generatedAt this time, according to N_giAnd distribution FD [ c ] of each feature of the cluster in which it is located_i][y_i]First, auxiliary sample x is generated_tempThen linear interpolation is carried out to obtain the final generated sample x_new；

Temporary sample x_tempAnd sample x_iBelong to the same cluster c_i；

Temporary sample x_tempAnd sample x_iHaving the same characteristics, but the characteristic values of the respective characteristics are according to the class c_iCharacteristic distribution of (F c)_i][y_i]Randomly sampling to obtain;

x_temp＝[f₁，f₂，...，f_p]，f_p＝Random(FD[c_i][y_i][p])

X_new＝x+Random(0，1)×(x_temp-x)

Cycling seed samples by N_gjAfter the secondary sample generation operation, obtaining a group of generated samples based on the seed sample, wherein the generated samples and the seed sample belong to the same category; after each seed sample finishes the sample generation, the obtained generated sample set D_gMerging with the original data set D to obtain the final required balance data set D_balance。

8. The power equipment fault detection model of claim 1, wherein the data outcome of the classification neural network model is represented as:

wherein, y_pred∈{0，1，2}，O_bIs a vector representation of the state sequence features; e'_pIs a state feature vector representation of the power equipment; e.g. of the type_aFor vector representation of predicted target, W_deep，b_deepAre output layer parameters.

9. The power equipment fault detection model of claim 8, wherein the vector of state sequence features represents O_bThe GRU module and the attention mechanism module are used for calculation;

the data processing procedure and result of the GRU module are expressed as:

r_t＝σ(W^ri_t+U^rh_t-1+b^r)

z_t＝σ(W^zi_t+U^zh_t-1+b^z)

wherein σ denotes a Sigmoid function, which indicates a hadamard product,

wherein n is_hidRepresenting GRU network hidden layer size, k representing embedded vector size, i_tRepresenting the input of the GRU, representing the t-th vector representation in the history state sequence, i.e. i_t＝e_b[t](ii) a Output value h of the GRU module_tThe t hidden state is represented and is a potential expression form of the past state of the power equipment;

wherein, a_tAn attention score representing an attention distribution calculation of the attention mechanism module; f.]Representing the attention scoring function.

10. The power device fault detection model of claim 8, wherein the state feature vector of the power device represents e'_pThe calculation of the drawing attention mechanism module is used for obtaining; the data processing results of the graph attention mechanism module are expressed as:

e′_p＝[e_p，e′_p1，e′_p2，...，e′_pn，X]

wherein e is_pRepresenting an embedded vector of features for the power device; e'_p1，e′_p2，...，e′_pnFor embedding the vector as e_pThe embedded vector of the associated power device node of the power device;

x is a coefficient set;

processing the feature vector of each power device by using a graph attention mechanism to obtain an output vector e'_p。