CN116562114A

CN116562114A - Power transformer fault diagnosis method based on graph convolution neural network

Info

Publication number: CN116562114A
Application number: CN202211479696.1A
Authority: CN
Inventors: 何明锋; 陈飞; 李付林; 叶国庆; 李毓; 张波; 黄红辉; 季克勤; 侯健生; 黄健; 王珂; 沃建栋; 叶宏; 贺燕; 吴峰; 金坚锋; 杨艳天; 王赢聪
Original assignee: Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Current assignee: Jinhua Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2023-04-25
Filing date: 2023-04-25
Publication date: 2023-08-08

Abstract

The invention discloses a power transformer fault diagnosis method based on a graph convolution neural network; the method comprises the following steps: s1, constructing a power transformer fault diagnosis method based on GCN; s2, constructing a GCN structure; s3, performing transformer fault diagnosis by using the GCN, performing data processing on the transformer faults, determining the output quantity of the model, and dividing the transformer faults into thermal faults and discharge faults. Specifically, the thermal faults include low temperature thermal faults (LT), medium temperature thermal faults (MT), and high temperature thermal faults (HT). The discharge faults include Partial Discharge (PD), low-energy discharge (LD), high-energy discharge (HD). The diagnostic process includes (1) data import and normalization; (2) reconstructing and dividing data; (3) the structure and parameters of the GCN are initialized. And training the model, and finally evaluating the performance of the GCN.

Description

Power transformer fault diagnosis method based on graph convolution neural network

Technical Field

The invention belongs to the technical field of power grid fault diagnosis, and particularly relates to a power transformer fault diagnosis method based on a graph convolution neural network.

Background

Along with the continuous expansion of the scale of the power system, the number of transformers is increased, meanwhile, the data of transformer faults are also increased, and the accuracy of the traditional fault diagnosis method is needed to be further improved. The operational state of the transformer is directly related to the safety and power quality of the whole power system. Once the transformer fails, local power failure and even large-scale power failure can be caused, so that the operation of the power system is affected and economic loss is caused. Therefore, accurate diagnosis of the state of the power transformer is of great importance to the power system.

Currently, large transformers mostly belong to oil-immersed transformers. When a fault occurs, the oil immersed transformer releases a large amount of dissolved gas, which is an important index for fault diagnosis by Dissolved Gas Analysis (DGA). Existing DGA-based transformer fault diagnosis methods can be categorized into two types, distance-based methods and model-based methods. The first category mainly includes case-based reasoning, expert systems, k-nearest neighbor (KNN) and twin neural networks. Generally, although these distance-based methods make full use of historical data and a priori knowledge through similarity metrics, it is difficult to capture complex nonlinear relationships between dissolved gases and corresponding tags, resulting in limited accuracy in transformer fault diagnosis. For the second class, traditional model-based algorithms include Support Vector Machines (SVMs), multi-layer perceptrons (MLPs), extreme gradient boosting (XGBoost), lightweight gradient boosting (light GBM). Generally, while these traditional methods are more suitable for smaller datasets, their limited feature extraction capabilities make it difficult to fully exploit the potential properties between dissolved gases and corresponding tags.

The power transformer fault diagnosis method based on deep learning disclosed in the authority publication number CN 115329908A is implemented, although a fault sample data set of a power transformer is obtained, the fault sample data set is preprocessed to obtain a training data set for training, a preset fault diagnosis model based on CNN is constructed, the preset fault diagnosis model is trained through the training data set to obtain a trained fault diagnosis model, super parameters of the trained fault diagnosis model are optimized to obtain a target fault diagnosis model, data to be analyzed is obtained, the data to be analyzed is analyzed through the target fault diagnosis model, and a fault diagnosis result corresponding to the data to be analyzed is output. However, the method is complex, is not suitable for engineering personnel to simplify operation, has the problem of less influence factors, and does not comprehensively consider the factors of the power transformer faults. The graph convolutional neural network (GCN) can effectively mine complex nonlinear relations between fault types and dissolved gases by using graph convolution layers with strong learning capability, and can also use an adjacency matrix to represent similarity measurement between unknown samples and marked samples, so that the accuracy of transformer fault diagnosis is improved.

Disclosure of Invention

The flow of the power transformer fault diagnosis method based on the graph convolution neural network provided by the invention is shown in figure 1, and the method specifically comprises the following 3 steps.

S1 construction GCN-based power transformer fault diagnosis method

The objective is to construct a feature function g= (V, E) taking as input a feature matrix X of dissolved gas content and an adjacency matrix a of samples:

Input＝(X,A) (1)

wherein X is an n X d feature matrix, i is composed of feature descriptions X for each node i, n is the number of nodes (n is the number of samples in the fault diagnosis of the transformer), and d is the input feature number. The adjacency matrix represents in matrix form a similarity measure between the historical data and the current sample.

The output of the convolutional layer is an nxf node vector Y, where F is the number of transformer states. Each of the layers of the volume may be written as a nonlinear function:

H ⁽ⁱ⁺¹⁾ ＝f(H ⁽ⁱ⁾ ,A),I＝0,1,...,L (2)

where L is the number of layers of the drawing coil. When i=0, H ⁽⁰⁾ When i=l, as in X, H ^(L) The same as Y. The specific graph convolution layers differ only in the choice of the activation function f and in the manner of parameterization.

Simple form of hierarchical propagation principle of graph convolution layer:

f(H ⁽ⁱ⁾ ,A)＝σ(AH ⁽ⁱ⁾ W ⁽ⁱ⁾ ) (3)

where σ is a nonlinear activation function, such as a modified linear (ReLU) function, and W (i) is the weight matrix in the ith picture volume layer.

Although the graph convolution layer is very powerful, it has two limitations to be addressed:

1) Multiplying by adjacency matrix a means that for each node it sums the eigenvectors of all neighboring nodes, rather than summing the nodes themselves (unless there is a self-loop in the graph structure data). This limitation can be solved by enforcing self-loops in the graph structure data (e.g., adding identity matrix to adjacency matrix a):

A′＝A+I (4)

2) The second limitation is that the adjacency matrix a 'is not normalized, so the multiplication may change the scale of the eigenvectors, which can be verified by examining the eigenvalues of adjacency matrix a'. To solve this problem, the adjacency matrix a' should be normalized by the following formula:

wherein D is the diagonal node degree matrix a' of the adjacency matrix a:

D _ij ＝∑ _j A′ _ij (6)

after using these two techniques, the new propagation principle of the graph roll stacking becomes:

f(H ⁽ⁱ⁾ ,A)＝σ(A″H ⁽ⁱ⁾ W ⁽ⁱ⁾ ) (7)

s2 construction GCN structure

Typically, the neural network diagnoses the fault type Y of the power transformer by inputting the dissolved gas content X. Specifically, GCN requires an n×n adjacency matrix a in addition to X, where n is the number of samples in the data set. The samples in the training set and the validation set are linked only to samples with the same label. For example, if the ith sample and the jth sample belong to partial discharge, a (i, j) =a (j, i) =1. For samples in the test set (unknown samples), the low-dimensional features of the input variables are extracted using the Siamese network, thereby calculating the euclidean distance between the samples. Then, k samples closest to the unknown sample are found using KNN and considered to be connected. For example, if k is equal to 1, the euclidean distance between the ith unknown sample and the jth sample is nearest, a (i, j) =a (j, i) =1. In this case, the adjacency matrix a may represent a similarity measure between the historical data and the current sample.

As shown in fig. 2, the graph structure data (X, a) is fed to the output H ⁽¹⁾ Is on the first layer. Specifically, a mixed feature matrixIt combines the feature vector of each node linearly with the feature vector of the neighboring node using the weights denoted by a ". Next, a new set of features +.>Weight matrix W of (2) ₁ Plus a bias vector b ₁ . Then, an activation function (e.g., reLU) is selected for the new vector to obtain output data H (1) of the first layer H:

also, the output data shown in FIG. 3 is a convolved layer

H ⁽²⁾ ＝ReLU(A″H ⁽¹⁾ W ₂ +b ₂ ) (9)

Wherein W is ₂ And b ₂ The weight matrix and the bias vector in the second layer of graph convolution layer, respectively.

Behind the picture scroll laminate there are two dense layers. Before inputting H, it is necessary to perform winning process H on the data ⁽²⁾ . Third layer, output data H ⁽³⁾ May be passed through a weight matrix W ₃ A deviation vector b ₃ One laserLiving function:

H ⁽³⁾ ＝ReLU(H ⁽²⁾ W ₃ +b ₃ ) (10)

the output of the fourth layer through the Softmax function is:

Y＝Softmax(H ⁽³⁾ W ₄ +b ₄ ) (11)

wherein Y is the type of transformer fault.

S3, performing transformer fault diagnosis by using GCN

S301 data processing

In normal operation, the solid organic insulating material and insulating oil of the power transformer are gradually aged due to the combined action of the electric field and the thermal field. Small amounts of dissolved gases, such as hydrogen and low molecular hydrocarbon gases, dissolve in transformer oil. If the transformer fails in discharge or in heat, the dissolved gas content increases rapidly. If the rate of dissolved gas is greater than the rate of gas absorption by the transformer oil, the excess gas will continue to diffuse and enter the relay, triggering an alarm. Currently, one common detection technique capable of diagnosing the type of fault in an oil-immersed transformer is by analyzing the content of dissolved gases. And a plurality of novel features are further constructed by using an IEC ratio method, a Dornenburg ratio method and a Rogers ratio method, so that the accuracy of fault diagnosis is effectively improved. Previous work has shown that CO and CO2 have a weak correlation with the transformer fault type, while H2, C2H6, CH4, C2H2, C2H4 have a strong correlation with the transformer fault type. Thus, the dissolved gas (H2, C2H6, CH4, C2H2, C2H 4) content was chosen as the original signature, and 4 new signatures (CH 4/H2, C2H2/C2H4, C2H4/C2H6, C2H6/CH4 constructed by the Rogers ratio method) were chosen to be further considered as input variables to the GCN.

Since the values of these 9 features differ greatly, if they are directly used as input variables, the performance of the model is adversely affected, and even the loss function is difficult to converge. Thus, nine features should be mapped to interval [0,1] by min-max normalization before being fed to the GCN:

x _i and x _i ' represents the ith feature before and after normalization, respectively. X is x _i,min And x _i,max Representing the maximum and minimum values of the ith feature before normalization, respectively.

Output variable of S302 model

Transformer faults can be classified into thermal faults and discharge faults. Specifically, the thermal faults include low temperature thermal faults (LT), medium temperature thermal faults (MT), and high temperature thermal faults (HT). The discharge faults include Partial Discharge (PD), low-energy discharge (LD), high-energy discharge (HD). In order to effectively calculate the cross entropy loss function during GCN training, various state types of the transformer are encoded as shown in table 1.

Table 1 transformer status encoding

Transformer state	Encoding
		Normal state	1000000
Low temperature thermal failure	0100000
		Low temperature thermal failure	0010000
Low temperature thermal failure	0001000
		Partial discharge	0000100
Low energy discharge	0000010
		High energy discharge	0000001

S303 fault diagnosis procedure

The fault diagnosis process of the transformer based on GCN is shown in fig. 4, and the specific steps are as follows:

(1) data import and normalization

Dissolved gas H ₂ 、C ₂ H ₆ 、CH ₄ 、C ₂ H ₂ 、C ₂ H ₄ Is considered as the original feature and 4 new features were constructed using Rogers ratio method. The above 9 features are used as input variables for the GCN. To obtain the adjacency matrix a, the low-dimensional features of the input variables are extracted using a Siamese network, thereby calculating the euclidean distance between samples. Then, k samples closest to the unknown sample are found using KNN and considered to be connected. Further, the input data is converted to a value of 0-1 using a min-max normalization method.

(2) Reconstruction and partitioning of data

The adjacency matrix is reshaped into a sparse matrix in the coordinate format because it contains a large number of 0 elements, resulting in a waste of space. In the dataset, 75% of the samples were used to train the GCN, and the remaining samples were used to evaluate the performance of the model.

(3) Initializing the structure and parameters of a GCN

In order to improve the accuracy of transformer fault diagnosis, it is necessary to explore the optimal structure and parameters before training the GCN. The structure and parameters of the GCN mainly comprise the number of layers of the graph convolution, the number of iterations, the k size of the adjacency matrix A and the selection of an optimizer. One basic structure of the GCN is shown in Table 2. The convolution filters have sizes of 8 and 16, respectively. All activation functions of the graph convolution layer are relus. To mitigate over-fitting, the picture volume layer is followed by a reject layer with a probability of 0.25. The size of the convolution filter and the probability of rejecting the layer are both the best values found.

TABLE 2 basic Structure of GCN

Through many experiments in case studies, the density layer output 1 x 7 vector was finally determined to represent the state of the unknown sample.

S304 training

The training process of GCN is completed by using a back propagation algorithm, and mainly comprises two steps of forward excitation propagation and backward weight updating. For forward excitation propagation, the input variable is transferred to the dense layer after being processed by a plurality of graph convolution layers, and the label of the sample is output by the dense layer. The diagnostic result and the real result are used to calculate the loss function (error). For backward weight updates, the chain law is used to transfer the error from the output layer to the middle layer. Then, the weight of each layer is updated by a gradient descent method. When the set number of iterations is reached, the test set is used to evaluate the performance of the GCN. In addition, in the training process, the accuracy of the model is improved by utilizing an integration technology.

S305 evaluating the performance of GCN

For binary classification problems, the result is either a positive class or a negative class. The accuracy and recall can be used to evaluate the performance of the model, but these indicators are not applicable to multiple classification problems such as transformer fault diagnosis. In general, k classification problems can be translated into k binary classification problems. In addition to accuracy, the geometric mean (G-mean) of Macro F1 and recall was used to evaluate the performance of the model. Both Macro F1 and G-means are positive indicators. That is, the larger the index, the better the performance.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a power transformer fault diagnosis method based on a graph convolution neural network, which is used for improving the accuracy of transformer fault diagnosis. Constructing a GCN-based power transformer fault diagnosis method; constructing a GCN structure; performing transformer fault diagnosis by using GCN, and performing data processing on transformer faults, wherein the diagnosis process comprises (1) data importing and normalizing; (2) reconstructing and dividing data; (3) the structure and parameters of the GCN are initialized. And training the model, and finally evaluating the performance of the GCN. The graph convolutional neural network (GCN) can effectively mine complex nonlinear relations between fault types and dissolved gases by using graph convolution layers with strong learning capability, and can also use an adjacency matrix to represent similarity measurement between unknown samples and marked samples, so that the accuracy of transformer fault diagnosis is improved.

Drawings

FIG. 1 is a flow chart of the method steps of the present invention;

FIG. 2 is a graph of a graph convolutional neural network;

FIG. 3 is a diagram of a multi-layer perceptron;

FIG. 4 is a GCN-based fault diagnosis flowchart;

FIG. 5 is a diagram of a GCN training process;

FIG. 6 is a graph of test set indicators for different k values.

Detailed Description

(1) Description of data

To test the performance of the GCN for transformer fault diagnosis, simulations and analyses were performed using actual data sets from the national grid company. The voltage rating of the sample was 220kV. After data cleaning, 718 samples are left in the data set, including 7 state types, namely normal, low-temperature thermal faults, medium-temperature thermal faults, high-temperature thermal faults, partial discharge, low-energy discharge and high-energy discharge. The number of training samples was 75%. The remaining data was used as test samples to evaluate the performance of the model. The sizes of the respective status types are shown in table 3.

Table 3 sample distribution of data sets

Tab.3 A SAMPLE DISTRIBUTION OF DATA SET

Transformer state	Total number of samples	Training sample data volume	Number of test samples
				Normal state	52	39	13
LT	99	74	25
				MT	73	55	18
HT	168	126	42
				PD	105	79	26
LD	42	31	11
				HD	179	134	45

(2) GCN training effect

In order to clearly observe the training process of the GCN, fig. 5 shows the trend of the loss function as the number of iterations increases.

In the early stages of the training process, the loss function of the training set drops rapidly with increasing number of iterations. When the number of iterations is greater than 400, the loss function tends to be constant and does not continue to drop, indicating that the GCN has converged. Generally, the training process of the GCN is relatively stable, and the convergence speed is high. To ensure convergence of the GCN, after 800 iterations, the unknown samples were diagnosed using the GCN.

To analyze the effect of the number of layers on GCN performance, the number of layers was gradually increased and the index of the test set under the different layers was counted as shown in Table 4.

Table 4 test set index for different number of layers

Layer number	Accuracy of	Macro	G-means	Time/s	Number of parameters
						1	0.714	0.689	0.709	35.27431	80
2	0.793	0.772	0.791	65.98592	152
						3	0.780	0.753	0.769	96.35367	224
4	0.699	0.672	0.688	126.7464	296
						5	0.683	0.655	0.673	157.6507	368

From Table four it can be concluded that 1) at an early stage, the index of the test set increased with increasing number of layers of the graph, indicating that the performance of the GCN was gradually increasing. When the number of the picture winding layers is 2, the accuracy of GCN, macro F1 and G-mean are maximum, and the fault diagnosis performance is best. This phenomenon suggests that it is difficult to mine complex nonlinear relationships between dissolved gas and transformer conditions with only a small number of layers of graph windings. The number of the graph convolution layers is increased, so that the characteristic learning capability of the GCN can be improved, and the precision fault diagnosis is improved. 2) It was further found that the number of parameters to be trained increases linearly with the number of layers of the graph convolution. When the number of layers is greater than 2, the performance of the GCN becomes worse if more convolution layers are added to the GCN. This is because the number of samples in the data set is limited. Too many layers of graph convolution not only increase the parameters of the GCN to be trained, but also consume a lot of training time, and are easy to overfit, reducing the accuracy of the diagnosis. 3) In general, the number of layers of a volume should be determined based on the size of the data set. With a smaller number of samples, the GCN can achieve better performance by setting the graph convolution layer to 2 layers.

The size of k determines the number of samples in the training set that are connected to each unknown sample, which will directly affect the adjacency matrix a. To investigate the effect of k on GCN performance, k was set to 1-20, and the performance of GCN on the test set was counted as shown in FIG. 6

When k is very small, the unknown samples are connected to the nearest samples only, which makes it difficult to fully exploit the similarity measure between samples, resulting in limited accuracy. In contrast, if the value of k is very large, the unknown sample is concatenated with many samples, which may cause the unknown sample to be concatenated with different types of samples, thereby generating noise, limiting the performance of the GCN. Thus, as the k value increases, the precision, macro F1 and G-mean increase and decrease. When the k value is 10, the accuracy of the test set, macro F1 and G-mean are maximum, and the performance of the GCN in fault diagnosis is optimal.

After initializing the structure and parameters of the GCN, the neural network needs to be optimized by adopting a gradient descent method. Currently popular gradient descent methods are SGD, RMSprop, adagrad, adadelta, adam, adamax and Nadam. Typically, these methods are often used as black boxes, as their principle is too complex to explain in practical engineering. In order to find optimizers suitable for GCN in transformer fault detection, these popular optimizers were set up and simulated and the metrics of the test set were counted as shown in Table 5.

Table 5 index of test set under different optimizers

As can be seen from table 4, the GCN at RMSprop, adagrad, adamax, nadam and Adam algorithm was used as an optimizer. Specifically, the accuracy of Adam algorithm, macro F1 and G-mean are slightly higher than the first four optimizers, indicating that Adam algorithm is the most suitable optimizer for GCN in transformer fault detection. Furthermore, the Adagrad, adadelta and SGD algorithms both correspond to an accuracy below 0.7, indicating that they are not suitable for GCN-based transformer fault diagnosis.

3.3 comparison under different input characteristics

To illustrate the effectiveness of GCN, common distance-based methods (e.g., KNN and Siamese networks) and model-based methods (e.g., CNN, MLP, XGBoost and SVM) were used as baselines. And comparing various indexes of the test set under different input characteristics. Through multiple experiments, the optimal parameters and structures of various algorithms are obtained as follows:

1) For KNN, k has a size of 7. 2) For the Siamese network, it includes two cnns with the same weights, which are used to calculate the Euclidean distance of the input data (the input data of the Siamese network is a pair of samples). Specifically, the filters of the two convolution layers are 16 and 36, respectively, and the convolution kernel has a size of 2×2. The size of the largest pooling layer is 2 x 2. A dropout layer is inserted after the two convolutional layers to mitigate over-fitting. The probability of the dropout layer is set to 0.25. The number of neurons in the dense layer was 8 and 1, respectively. The activation function of all layers is a ReLU function. 3) For CNN, it includes two convolutional layers, two max-pooling layers, two dropout layers, and two density layers. The probability of the dropout layer is 0.25. The size of the kernel in the convolutional layer is 3. The size of the pools in the maximum pooling layer is 2 x 2. The activation function of the convolutional layer is a ReLU function. The number of neurons in the dense layer was 14 and 7, respectively. 4) For MLP, the input layer neuron number is 9, and the intermediate layer neuron number is 9 and 7, respectively. The number of neurons of the output layer is equal to the number of categories. To mitigate overfitting, one dropout layer is inserted between each dense layer. 5) For XGBoost, the maximum depth is 5 and the gamma value is 0.2. The subsampled rate is 0.6 and the minimum subspan weight is 3. 6) For SVM, the transformer fault type is classified using the fiteco function in MATLAB2018 a.

The above algorithm was trained under different input features, and the simulation results of the test set are shown in tables 6 and 7.

Table 6 5GCN results of raw feature training

Method	Accuracy of	Macro	G-means
				GCN	0.781	0.758	0.776
CNN	0.74	0.717	0.734
				MLP	0.633	0.602	0.62
XGBoost	0.634	0.606	0.626
				SVM	0.642	0.616	0.635
KNN	0.669	0.644	0.664
				Siamese Network	0.751	0.725	0.741

Table 5 9 GCN results of raw feature training

From the table it can be concluded that 1) accuracy represents the probability that the model can correctly identify positive and negative classes. If there is an imbalance problem with the dataset, the effect of the imbalance class accuracy can be significant. Thus, macro F1 and G-mean were chosen to evaluate the performance of the model. As can be seen from the table, the values of accuracy, macro F1 and G-mean are similar, which illustrates that the probability of a model for correct diagnosis of various fault types is very close. In addition, the values of G-mean are all greater than 0, indicating that there is no missed detection for each fault type in the model. 2) The accuracy, macro F1 and G-mean of each algorithm in Table 7 are all greater than those of Table 6, indicating that four features constructed by the Rogowski method improve the performance of each algorithm in transformer fault diagnosis to some extent. 3) With two different input features, the GCN has higher diagnostic performance than other algorithms. Table 7 illustrates. The GCN accuracy, macro F1 and G-mean were 79.3%, 77.2% and 79.1%, respectively. Compared with CNN, MLP, XGBoost, SVM, KNN and Siamese networks, the accuracy of GCN is improved by 3.7%, 12.1%, 15.2%, 10.5%, 9.3% and 2.2%, and the Macro F1 of GCN is respectively increased by 4.0%, 12.3%, 15.9%, 11.0%, 9.5% and 2.7%; the G-mean of GCN was increased by 4.3%, 12.2%, 16.0%, 10.8%, 9.6% and 2.8%, respectively. 4) Last but not least, both CNNs and GCNs use convolutional layers to extract the features of the input data. The difference is that the latter also takes into account the similarity measure between the unknown sample and the labeled sample by means of the adjacency matrix. The performance of GCN is superior to CNN, which suggests that CNN ignores similarity metrics between samples, limiting the accuracy of fault diagnosis. The GCN can not only explore the relation between the characteristics and the fault types by using the graph convolution layer, but also consider the similarity measurement between samples, so that the fault types of the transformer can be diagnosed more accurately.

Claims

S3, performing transformer fault diagnosis by using the GCN, performing data processing on the transformer fault, determining the output quantity of the model, and dividing the transformer fault into a thermal fault and a discharge fault. Specifically, the thermal faults include low temperature thermal faults (LT), medium temperature thermal faults (MT), and high temperature thermal faults (HT). The discharge faults include Partial Discharge (PD), low-energy discharge (LD), high-energy discharge (HD). The diagnostic process includes (1) data import and normalization; (2) reconstructing and dividing data; (3) the structure and parameters of the GCN are initialized. And training the model, and finally evaluating the performance of the GCN.
2. The intelligent micro-grid capacity optimization method based on wind-solar energy storage integration according to claim 1, wherein the method is characterized by comprising the following steps: s1, constructing a GCN-based power transformer fault diagnosis method: the objective is to construct a feature function g= (V, E) taking as input a feature matrix X of dissolved gas content and an adjacency matrix a of samples:

wherein X is an n X d feature matrix, i is composed of feature descriptions X for each node i, n is the number of nodes (n is the number of samples in the fault diagnosis of the transformer), and d is the input feature number. The adjacency matrix represents similarity measurement between the historical data and the current sample in a matrix form;

the output of the convolutional layer is an nxf node vector Y, where F is the number of transformer states. Each of the layers of the volume may be written as a nonlinear function:

H ⁽ⁱ⁺¹⁾ ＝f(H ⁽ⁱ⁾ ,A),I＝0,1,...,L (2)

where L is the number of layers of the drawing coil. When i=0, H ⁽⁰⁾ When i=l, as in X, H ^(L) The same as Y. The specific graph convolution layers differ only in the choice of the activation function f and in the manner of parameterization;

simple form of hierarchical propagation principle of graph convolution layer:

f(H ⁽ⁱ⁾ ,A)＝σ(AH ⁽ⁱ⁾ W ⁽ⁱ⁾ ) (3)

where σ is a nonlinear activation function, such as a modified linear (ReLU) function, and W (i) is the weight matrix in the ith picture volume layer;

although the graph convolution layer is very powerful, it has two limitations to be addressed:

1) Multiplying by adjacency matrix a means that for each node it sums the eigenvectors of all neighboring nodes, rather than summing the nodes themselves (unless there is a self-loop in the graph structure data). This limitation can be solved by enforcing self-loops in the graph structure data (e.g., adding identity matrix to adjacency matrix a):

A′＝A+I (4)

2) The second limitation is that the adjacency matrix a 'is not normalized, so the multiplication may change the scale of the eigenvectors, which can be verified by examining the eigenvalues of adjacency matrix a'. To solve this problem, the adjacency matrix a' should be normalized by the following formula:

wherein D is the diagonal node degree matrix a' of the adjacency matrix a:

D _ij ＝∑ _j A′ _ij (6)

after using these two techniques, the new propagation principle of the graph roll stacking becomes:

f(H ⁽ⁱ⁾ ,A)＝σ(A″H ⁽ⁱ⁾ W ⁽ⁱ⁾ ) (7)。
3. the intelligent micro-grid capacity optimization method based on wind-solar energy storage integration according to claim 1, wherein the method is characterized by comprising the following steps: s2, constructing a GCN structure: typically, the neural network diagnoses the fault type Y of the power transformer by inputting the dissolved gas content X. Specifically, GCN requires an n×n adjacency matrix a in addition to X, where n is the number of samples in the data set. The samples in the training set and the validation set are linked only to samples with the same label. For example, if the ith sample and the jth sample belong to partial discharge, a (i, j) =a (j, i) =1. For samples in the test set (unknown samples), the low-dimensional features of the input variables are extracted using the Siamese network, thereby calculating the euclidean distance between the samples. Then, k samples closest to the unknown sample are found using KNN and considered to be connected. For example, if k is equal to 1, the euclidean distance between the ith unknown sample and the jth sample is nearest, a (i, j) =a (j, i) =1. In this case, adjacency matrix a may represent a similarity measure between the historical data and the current sample;

as shown in fig. 2, the graph structure data (X, a) is fed to the output H ⁽¹⁾ Is on the first layer. Specifically, a mixed feature matrixIt combines the feature vector of each node linearly with the feature vector of the neighboring node using the weights denoted by a ". Next, a new set of features +.>Weight matrix W of (2) ₁ Plus a bias vector b ₁ . Then, an activation function (such as ReLU) is selected for the new vector to obtain the output data H of the first layer H ⁽¹⁾ :

Likewise, the output data of the second graph is a convolved layer

H ⁽²⁾ ＝ReLU(A″H ⁽¹⁾ W ₂ +b ₂ ) (9)

Wherein W is ₂ And b ₂ The weight matrix and the offset vector in the second layer of graph convolution layer are respectively;

behind the picture scroll laminate there are two dense layers. Before inputting H, it is necessary to perform winning process H on the data ⁽²⁾ . Third layer, output data H ⁽³⁾ May be passed through a weight matrix W ₃ A deviation vector b ₃ An activation function:

H ⁽³⁾ ＝ReLU(H ⁽²⁾ W ₃ +b ₃ ) (10)

the output of the fourth layer through the Softmax function is:

Y＝Softmax(H ⁽³⁾ W ₄ +b ₄ ) (11)

wherein Y is the type of transformer fault.
4. The intelligent micro-grid capacity optimization method based on wind-solar energy storage integration according to claim 1, wherein the method is characterized by comprising the following steps: s3, performing transformer fault diagnosis by using the GCN:

s301 data processing

In normal operation, the solid organic insulating material and insulating oil of the power transformer are gradually aged due to the combined action of the electric field and the thermal field. Small amounts of dissolved gases, such as hydrogen and low molecular hydrocarbon gases, dissolve in transformer oil. If the transformer fails in discharge or in heat, the dissolved gas content increases rapidly. If the rate of dissolved gas is greater than the rate of gas absorption by the transformer oil, the excess gas will continue to diffuse and enter the relay, triggering an alarm. Currently, one common detection technique capable of diagnosing the type of fault in an oil-immersed transformer is by analyzing the content of dissolved gases. And a plurality of novel features are further constructed by using an IEC ratio method, a Dornenburg ratio method and a Rogers ratio method, so that the accuracy of fault diagnosis is effectively improved. Previous work has shown that CO and CO2 have a weak correlation with the transformer fault type, while H2, C2H6, CH4, C2H2, C2H4 have a strong correlation with the transformer fault type. Thus, the dissolved gas (H2, C2H6, CH4, C2H2, C2H 4) content was chosen as the original signature, and 4 new signatures (CH 4/H2, C2H2/C2H4, C2H4/C2H6, C2H6/CH4 constructed by the Rogers ratio method) were chosen to be further considered as input variables to the GCN;

since the values of these 9 features differ greatly, if they are directly used as input variables, the performance of the model is adversely affected, and even the loss function is difficult to converge. Thus, nine features should be mapped to interval [0,1] by min-max normalization before being fed to the GCN:

x _i and x' _i The i-th feature before and after normalization is shown, respectively. X is x _i,min And x _i,max Respectively representing the maximum value and the minimum value of the ith feature before normalization;

output variable of S302 model

Transformer faults can be classified into thermal faults and discharge faults. Specifically, the thermal faults include low temperature thermal faults (LT), medium temperature thermal faults (MT), and high temperature thermal faults (HT). The discharge faults include Partial Discharge (PD), low-energy discharge (LD), high-energy discharge (HD). In order to effectively calculate a cross entropy loss function in the GCN training process, coding various state types of the transformer;

s303 fault diagnosis procedure

The fault diagnosis process of the transformer based on GCN is shown in fig. 4, and the specific steps are as follows:

(1) data import and normalization

Dissolved gas H ₂ 、C ₂ H ₆ 、CH ₄ 、C ₂ H ₂ 、C ₂ H ₄ Is considered as the original feature and 4 new features were constructed using Rogers ratio method. The above 9 features are used as input variables for the GCN. To obtain the adjacency matrix a, the low-dimensional features of the input variables are extracted using a Siamese network, thereby calculating the euclidean distance between samples. Then, k samples closest to the unknown sample are found using KNN and considered to be connected. Further, converting the input data into a value of 0-1 using a min-max normalization method;

(2) reconstruction and partitioning of data

The adjacency matrix is reshaped into a sparse matrix in the coordinate format because it contains a large number of 0 elements, resulting in a waste of space. In the dataset, 75% of the samples were used to train the GCN, and the remaining samples were used to evaluate the performance of the model;

(3) initializing the structure and parameters of a GCN

In order to improve the accuracy of transformer fault diagnosis, it is necessary to explore the optimal structure and parameters before training the GCN. The structure and parameters of the GCN mainly comprise the number of layers of graph convolution, the number of iterations, the k size of the adjacent matrix A and the selection of an optimizer;

s304 training

The training process of GCN is completed by using a back propagation algorithm, and mainly comprises two steps of forward excitation propagation and backward weight updating. For forward excitation propagation, the input variable is transferred to the dense layer after being processed by a plurality of graph convolution layers, and the label of the sample is output by the dense layer. The diagnostic result and the real result are used to calculate the loss function (error). For backward weight updates, the chain law is used to transfer the error from the output layer to the middle layer. Then, the weight of each layer is updated by a gradient descent method. When the set number of iterations is reached, the test set is used to evaluate the performance of the GCN. In addition, in the training process, the accuracy of the model is improved by utilizing an integration technology;

s305 evaluating the performance of GCN

For binary classification problems, the result is either a positive class or a negative class. The accuracy and recall can be used to evaluate the performance of the model, but these indicators are not applicable to multiple classification problems such as transformer fault diagnosis. In general, k classification problems can be translated into k binary classification problems. In addition to accuracy, the geometric mean (G-mean) of Macro F1 and recall was used to evaluate the performance of the model. Both Macro F1 and G-means are positive indicators. That is, the larger the index, the better the performance.