CN112308208B

CN112308208B - Transformer fault diagnosis method based on deep learning model

Info

Publication number: CN112308208B
Application number: CN202011010335.3A
Authority: CN
Inventors: 陈仕骄; 姚小龙; 黄建涛; 吴国天; 杨昌隆; 苏克勇; 罗巍; 周嘉璐; 宁嘉; 王一
Original assignee: Puer Supply Power Bureau of Yunnan Power Grid Co Ltd
Current assignee: Puer Supply Power Bureau of Yunnan Power Grid Co Ltd
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2023-01-24
Anticipated expiration: 2040-09-23
Also published as: CN112308208A

Abstract

The invention relates to a transformer fault diagnosis method based on a deep learning model, and belongs to the technical field of transformer fault diagnosis. The method comprises the steps of data acquisition of a fault sample, creation of a DBN-based fault diagnosis model, model pre-training, model fine-tuning and online fault identification. The method provided by the invention can capture data characteristics more easily under a small data sample, and has stronger capability of autonomously extracting characteristics and higher target identification precision. Experiments show that the RBM-T average accuracy of the transformer fault diagnosis model is higher than 94%, and the method is easy to popularize and apply.

Description

Transformer fault diagnosis method based on deep learning model

Technical Field

The invention belongs to the technical field of transformer fault diagnosis, and particularly relates to a transformer fault diagnosis method based on a deep learning model.

Background

With the continuous improvement of the economic level, the power system in China is unprecedentedly developed, the scale of the power grid is continuously enlarged, the number of the transformer substations is multiplied, and the safe and reliable operation of the power system is also challenged. If a power system fails or has large-scale power failure, huge economic loss can be caused, public safety can be harmed, and serious social influence is brought. As a core element of the power system, the transformer increases the voltage of the power generated by the power plant and transmits the increased voltage to the power grid, and reduces the high voltage of the power grid to a rated voltage and transmits the reduced voltage to a user. Meanwhile, the voltage level of the main network frame is changed, and the main network frame is a main connection tool for interconnecting regional power grids, so that the operating state of the main network frame is decisive for the safe and stable operation of the whole power system.

The transformer fault diagnosis technology can improve the health level of the transformer and has great significance for safe and stable operation of a power grid. At present, many researchers have proposed fault diagnosis methods for transformers, such as convolutional neural networks, support vector machines, cyclic neural networks, and so on. Zhao Wenqing et al propose a latent fault diagnosis strategy for transformers based on improved principal component analysis. Yang Tingfang et al propose learning dissolved gas and fault characteristics of a power transformer by using a BP neural network to construct a diagnosis model. The transformer fault diagnosis model needs a large number of samples to be trained, and then the accurate fault diagnosis model can be obtained. However, in practical engineering applications, the number of samples for diagnosing the transformer fault is often small. If the number of samples is too small, deep-level feature extraction cannot be independently completed by constructing a fault diagnosis model by using the method, and meanwhile, the problems of local optimal solution, low identification precision, weak generalization capability and the like are easily caused. Therefore, how to overcome the defects of the prior art is a problem which needs to be solved urgently in the technical field of transformer fault diagnosis at present.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a transformer fault diagnosis method based on a deep learning model.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

the transformer fault diagnosis method based on the deep learning model comprises the following steps:

step (1), acquiring data of fault samples: collecting fault samples and analyzing the fault samplesDividing the test sample into a training sample set and a test sample set; the fault samples comprise the fault type of the transformer and H dissolved in oil ₂ 、CH ₄ 、C ₂ H ₆ 、C ₂ H ₄ 、C ₂ H ₂ The content of (a);

step (2), a DBN-based fault diagnosis model is created: the input in the fault diagnosis model is H dissolved in oil ₂ 、CH ₄ 、C ₂ H ₆ 、C ₂ H ₄ 、C ₂ H ₂ The content of (a); the output is a fault type;

setting the number of RBM layers and the number of nodes of each RBM layer;

let RBM be a two-layer network consisting of 1 visible layer v and 1 hidden layer h, where v = [ v = ₁ ,v ₂ ,...,v _n ]N input nodes of the visible layer unit; h = [ h = ₁ ,h ₂ ,...,h _m ]M output nodes of the hidden layer unit; w = [ w = _ij ] _n×m A connection weight matrix of input layers to output layers, where i =1,2., n and j =1,2., m; a = [ a ] ₁ ,a ₂ ,...,a _n ]Wherein a is _i Is the ith visible cell v _i Bias of (3); b = [ B ] ₁ ,b ₂ ,...,b _m ]Wherein b is _j For the jth hidden unit h _j Bias of (3);

step (3), model pre-training: introducing the training sample set into the model in the step (2), training each layer of RBM from bottom to top layer by layer, and obtaining the initialization parameter w of each layer of RBM _ij 、a _i 、b _j ；

Step (4), fine adjustment of the model: reversely adjusting the pre-training parameters by adopting a BP neural network algorithm to obtain a final DBN fault diagnosis model; wherein, the pre-training parameter refers to the initialization parameter w of each layer of RBM _ij 、a _i 、b _j ；

Step (5), online fault identification: and importing the real-time operation data of the transformer into a final DBN fault diagnosis model for fault identification.

Further, it is preferable that, in the step (1), the ratio of the number of samples of the training sample set to the number of samples of the test sample set is set to 9:1.

Further, it is preferable that, in step (1), the failure types are divided into five types: iron core grounding, coil failure, bare metal overheating, screen discharge, and turn insulation damage.

Further, in step (2), it is preferable that the number of RBM layers be 5 and the number of nodes per RBM layer be 10.

Further, preferably, in the step (3), the specific method for model pre-training is as follows:

step (3.1): input training sample v _i Setting a learning rate mu, a training number T and an allowable reconstruction error threshold R, while keeping the parameter theta = { w = { (w) } _ij ，a _i ，b _j Carrying out initialization processing;

step (3.2): selecting 1 training sample v _i Calculating P (h) using equation (7) _j = 1|v) and samples 1 hidden layer vector h from the distribution and then calculates the forward gradient<v _i h>；

The forward gradient is calculated as follows:

σ represents an activation function;

step (3.3): calculating P (v) using equation (8) based on the hidden layer vector h _i = 1|h), noted v _i1 ；

Step (3.4): based on v _i1 Repeating the step (3.2) to obtain a new hidden layer vector h which is recorded as h ₁ Then calculating the inverse gradient<v _i1 h ₁ >；

Step (3.5): the parameters are updated according to the formulas (10) to (12) and are recorded as w' _ij ,a′ _i ,b′ _j ：

w′ _ij ＝w _ij +prob(<v _i h>-<v _i1 h ₁ >) (10)

a′ _i ＝a _i +μ|v _i -v _i1 | (11)

b′ _j ＝b _j +μ|h _j -h _j-1 | (12)

prob represents a random number of 0 or 1;

step (3.6): repeating the steps (3.2) to (3.5) to calculate a new training sample v _i2 ，v _i3 ，…，v _iT And a new hidden layer vector h ₂ ，h ₃ ，…，h _T Where T is the current training times, then the reconstruction error r = | | | v is calculated _iT -v _i L. When T reaches the maximum training time T _max Or R is less than the reconstruction error threshold value R, stopping training.

Further, it is preferable that the learning rate μ be 98%; selecting 450 times of training times; the allowable reconstruction error threshold R is taken to be 0.98.

Further, preferably, in step (4), the method for adjusting the pre-training parameters in the reverse direction includes: and calculating the average value of the calculation numbers of the parameters obtained by the reverse calculation and the initialization parameters.

Further, preferably, in the step (4), after the final DBN fault diagnosis model is established, the method further includes a step of testing performance of the transformer fault diagnosis model, specifically: and (4) introducing the test sample set into the fault diagnosis model obtained in the step (4), and performing fault prediction and actual working condition comparison so as to evaluate the performance of the fault diagnosis model.

When the training is started, the initialization processing can be carried out by referring to other models for preliminary value taking; both may also be set to 1.

In the invention, sigma represents an activation function, and in the information science, due to the properties of single increment, single increment of an inverse function and the like, the Sigmoid function is often used as the activation function of a neural network, and variables are mapped between 0,1 and expressed as:

the invention aims to reverse adjustment so as to improve the accuracy and adaptability of parameters.

The Deep Belief Network (DBN) is a probability generation model commonly used in a Deep learning model, is easier to capture data characteristics under small data samples, and has stronger capability of autonomously extracting characteristics and higher target identification precision. Compared with a convolutional neural network algorithm, the DBN has the advantages of small training sample size, flexibility and simplicity in operation, good compatibility with other algorithms and the like.

Compared with the prior art, the invention has the following beneficial effects:

the method provided by the invention can capture data characteristics more easily under a small data sample, and has stronger capability of autonomously extracting characteristics and higher target identification precision. Compared with a convolutional neural network algorithm, the DBN has the advantages of small training sample size, flexibility and simplicity in operation, good compatibility with other algorithms and the like. And transformer faults are just characterized by small data samples, so the DBN is used for fault diagnosis of the transformer. Experiments show that the RBM-T average accuracy of the transformer fault diagnosis model is higher than 94%.

Drawings

FIG. 1 is a schematic diagram of a limiting Boltzmann machine.

Detailed Description

The present invention will be described in further detail with reference to examples.

It will be appreciated by those skilled in the art that the following examples are illustrative of the invention only and should not be taken as limiting the scope of the invention. The examples do not specify particular techniques or conditions, and are performed according to the techniques or conditions described in the literature in the art or according to the product specifications. The materials or equipment used are not indicated by manufacturers, and all are conventional products available by purchase.

1 limiting Boltzmann machine (RBM) of deep learning model

Deep Belief Networks (DBNs) are a common probabilistic generation model in Deep learning models. The deep belief network is formed by stacking a plurality of layers of Restricted Boltzmann Machines (RBMs), and each layer of RBMs can enable the whole neural network to generate training data according to the maximum probability by adjusting the weight among the neurons, thereby carrying out feature extraction and target identification. A typical RBM structure is a two-layer network consisting of 1 visible layer v and 1 hidden layer h, as shown in fig. 1. The visible layer is used for receiving input data, the hidden layer is used for extracting data features, neurons in the visible layer and neurons in the hidden layer are connected with each other, and neurons in all layers are independent of each other.

In fig. 1, v = [ v = ₁ ,v ₂ ,...,v _n ]N input nodes of the visible layer unit; h = [ h = ₁ ,h ₂ ,...,h _m ]M output nodes of the hidden layer unit; w = [ w = _ij ] _n×m A connection weight matrix of input layer to output layer, where i =1,2, · n and j =1,2, · m; a = [ a ] ₁ ,a ₂ ,...,a _n ]Wherein a is _i Is the ith visible cell v _i Bias of (3); b = [ B ] ₁ ,b ₂ ,...,b _m ]Wherein b is _j For the jth hidden unit h _j Is used to control the bias of (1).

For a given visible layer input v and hidden layer output h, the energy function E (v, h | θ) of the limiting Boltzmann machine is

In the formula: θ = { w _ij ,a _i ,b _j And f, setting parameters of the RBM model.

The energy function E (v, h | θ) can be considered as beingAnd under the node distribution state of the front visible layer and the hidden layer, the energy value between each visible node and the hidden node. Assuming that each node of the visible layer and the hidden layer has 2 states of 0 and 1, the visible layer and the hidden layer nodes may be composed of t =2 ^n+m And (4) a state pair. Performing exponential and regularization processing on the energy function to obtain joint probability distribution P (v, h | theta) of node sets { v, h } of the visible layer and the hidden layer respectively under a certain state pair to obtain

In the formula (I), the compound is shown in the specification,

is a normalization factor (also called a partition function) that represents the sum of all possible state pairs in the visible and hidden layer nodes.

According to the formula (2), the joint probability distribution P (v, h) in any state can be obtained theoretically, but the calculation difficulty of Z is very large, so the joint probability distribution is approximately solved by the gibbs sampling method. The marginal probability distribution P (v) of the visible layer v node set can be obtained by summing all binary states of m nodes in the hidden layer h, namely

Similarly, the edge probability distribution P (h) of the hidden layer h is

The edge distribution is called a likelihood function, e.g., the probability that the set of P (h) hidden layer nodes is in a certain state distribution. From P (h), the conditional probability distribution P (v | h) of the visible layer can be obtained:

similarly, the edge probability distribution P (h | v) of the hidden layer is

According to the structural characteristics of no connection in the RBM layer and full connection between layers, for a given visible unit state v, the activation probability P (h | v) of the jth hidden unit can be obtained through the conditional probability distribution function P (h | v) of the hidden layer _j ＝1|v)：

For a given hidden unit state h, the activation probability of the ith visible unit is P (v) _i ＝1|h)：

σ represents an activation function;

2 RBM model parameter solving

When solving the RBM model, 2 parts of parameters are required to be determined: the node number of the visible layer and the hidden layer is the first, and the model parameter theta is the second. The number of nodes of the visible layer is the dimension of the input data sample matrix, the number of nodes of the hidden layer is determined according to the actual working condition, 5-15 nodes are usually selected, and 12 nodes are selected in the number of nodes of the visible layer and the number of nodes of the hidden layer in the experiment. As for the model parameter θ, when a training sample v = [ v ] is given ₁ ,v ₂ ,...,v _n ]Then, training an RBM model has the significance of continuously adjusting the model parameter theta, so that the probability distribution of the nodes of the RBM visible layer is matched with the input data as much as possible.

When solving the optimal problem, the invention introduces a method of derivation of a likelihood function (marginal probability distribution P (v)).

As can be seen from equation (6), since the energy E is inversely proportional to the probability P (v), the energy E can be minimized by solving for the maximum P (v). The maximum likelihood function adopts gradient ascending method, so that the model parameter theta is adjusted according to the following formula, namely

In the formula: theta' is the theta parameter value after adjustment and update; μ is the learning rate.

The contrast divergence (CD-k) method is adopted to approximately solve the parameter gradient, and the specific flow is as follows:

step 1: input training sample v _i Setting a learning rate mu, a training number T and an allowable reconstruction error threshold R, while keeping the parameter theta = { w = { (w) } _ij ，a _i ，b _j Carry on initialization processing. Preferably, the learning rate mu is 98%; selecting 450 times of training times; the allowable reconstruction error threshold value R is 0.98, and the initialization is set to be 1;

and 2, step: selecting 1 training sample v _i Calculating P (h) using equation (7) _j = 1|v) and samples 1 hidden layer vector h from the distribution and then calculates the forward gradient<v _i h>；

The forward gradient is calculated as follows:

σ represents an activation function;

and step 3: calculating P (v) using equation (8) based on the hidden layer vector h _i = 1|h), denoted v _i1 ；

And 4, step 4: based on v _i1 Repeating the step (3.2) to obtain a new hidden layer vector h which is recorded as h ₁ Then calculating the inverse gradient<v _i1 h ₁ >；

And 5: the parameters are updated according to the formulas (10) to (12) and are recorded as w' _ij ,a′ _i ,b′ _j ：

w′ _ij ＝w _ij +prob(<v _i h>-<v _i1 h ₁ >) (10)

a′ _i ＝a _i +μ|v _i -v _i1 | (11)

b′ _j ＝b _j +μ|h _j -h _j-1 | (12)

prob represents a random number of 0 or 1;

step 6: repeating the steps (3.2) to (3.5) to calculate a new training sample v _i2 ，v _i3 ，…，v _iT And a new hidden layer vector h ₂ ，h ₃ ，…，h _T Where T is the current training times, then the reconstruction error r = | | | v is calculated _iT -v _i L. When T reaches the maximum training time T _max Or when R is smaller than the reconstruction error threshold value R, stopping training, namely, the hidden layer vector h can be obtained _T As the visible layer v of the next layer RBM and a new training round is carried out.

2 deep belief network training

The deep belief network generally adopts a layer-by-layer training method for learning, and is divided into 2 stages of pre-training and fine-tuning. In the pre-training stage, a layer-by-layer unsupervised learning mode from bottom to top can be adopted, and parameter initialization and sample data feature extraction are required to be carried out on each layer; in the fine tuning stage, the initialization parameters can be fine tuned by adopting a BP neural network algorithm according to the features extracted in the pre-training stage.

3 transformer fault diagnosis model

The transformer fault diagnosis method based on the deep belief network mainly comprises 2 stages of off-line learning and on-line identification, and specifically comprises the following steps:

step (1), acquiring data of fault samples: collecting fault samples, and dividing the fault samples into a training sample set and a testing sample set; the fault samples comprise the fault type of the transformer and H dissolved in oil ₂ 、CH ₄ 、C ₂ H ₆ 、C ₂ H ₄ 、C ₂ H ₂ The content of (A);

setting the number of RBM layers and the number of nodes of each RBM layer;

let RBM be a two-layer network consisting of 1 visible layer v and 1 hidden layer h, where v = [ v ] ₁ ,v ₂ ,...,v _n ]N input nodes of the visible layer unit; h = [ h = ₁ ,h ₂ ,...,h _m ]M output nodes of the hidden layer unit; w = [ w = _ij ] _n×m A connection weight matrix of input layer to output layer, where i =1,2, · n and j =1,2, · m; a = [ a ] ₁ ,a ₂ ,...,a _n ]Wherein a is _i Is the ith visible cell v _i Bias of (3); b = [ B ] ₁ ,b ₂ ,...,b _m ]Wherein b is _j For the jth hidden unit h _j Bias of (3);

Step (5), testing the performance of the transformer fault diagnosis model: and (4) introducing the test sample set into the fault diagnosis model obtained in the step (4), and performing fault prediction and actual working condition comparison so as to evaluate the performance of the fault diagnosis model.

Step (6), online fault identification: and importing the real-time operation data of the transformer into a final DBN fault diagnosis model for fault identification.

In the step (1), the ratio of the number of samples in the training sample set to the number of samples in the testing sample set is 9:1. The types of faults are divided into five types: iron core grounding, coil failure, bare metal overheating, screen discharge and turn insulation damage.

In the step (2), the number of RBM layers is set to be 5 and the number of nodes of each layer of RBM is set to be 10.

In the step (4), the method for reversely adjusting the pre-training parameters comprises the following steps: and calculating the average value of the calculation numbers of the parameters obtained by the reverse calculation and the initialization parameters.

4 Transformer fault diagnosis example

The invention collects the total 3985 groups of transformer fault cases of a certain power grid 110kV transformer substation, wherein the cases mainly comprise the following information: transformer oil chromatographic data and fault type of the transformer. 2000 cases were randomly drawn as training data for the fault diagnosis model, and the remaining 1985 cases were used as test data.

In order to verify the effectiveness of the transformer fault diagnosis model provided by the invention, the actual transformer fault type is compared with the fault type output by the diagnosis model. In order to further verify the superiority of the training speed of the fault diagnosis model provided by the invention, the fault diagnosis model of the transformer provided by the invention is compared with other diagnosis methods, and then the algorithms participating in the fault diagnosis of the transformer comprise:

the transformer fault diagnosis method provided by the invention is named as RBM-T.

References: zhao Wenqing, yanghai, zhou Zhendong, shao Xujiang transformer fault diagnosis based on residual BP neural network [ J ] Power automation device, 2020,40 (02): 143-148, where the proposed transformer fault diagnosis algorithm is named BP-T.

Reference characters: the method is characterized in that the fault diagnosis algorithm of the hydraulic pump based on empirical wavelet decomposition and convolutional neural network [ J ]. Hydraulic and pneumatic, 2020 (01): 163-170.

TABLE 1 accuracy of transformer fault diagnosis model

Fault diagnosis method	RBM-T	BP-T	CNN-T
				Iron core grounding	94.14％	84.23％	93.24％
Coil failure	94.28％	86.39％	94.44％
				Bare metal superheat	93.61％	83.88％	93.13％
Screen discharge	93.27％	86.78％	93.17％
				Damage of turn insulation	95.55％	82.99％	95.65％

Table 1 shows the accuracy of the RBM-T, BP-T and CNN-T methods for diagnosing typical faults of transformer parts. As can be seen from table 1, the transformer fault diagnosis accuracy of the BP-T method is the lowest, and the diagnosis accuracy of the method for transformer core grounding and transformer coil fault is 84.23% and 86.39%, respectively; for the two transformer fault types, the accuracy rates of the RBM-T and the CNN-T are 94.14%, 94.28%, 93.24% and 94.44%, respectively. The RBM-T average accuracy of the transformer fault diagnosis model provided by the invention is higher than 94%, which also fully proves the effectiveness of the model provided by the invention.

The foregoing shows and describes the general principles, principal features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The transformer fault diagnosis method based on the deep learning model is characterized by comprising the following steps of:

step (1), acquiring data of fault samples: collecting fault samples, and dividing the fault samples into a training sample set and a test sample set; the fault sample comprises the fault type of the transformer and H dissolved in oil ₂ 、CH ₄ 、C ₂ H ₆ 、C ₂ H ₄ 、C ₂ H ₂ The content of (a);

step (2), a DBN-based fault diagnosis model is created: input in the fault diagnosis model is H dissolved in oil ₂ 、CH ₄ 、C ₂ H ₆ 、C ₂ H ₄ 、C ₂ H ₂ The content of (A); the output is a fault type;

setting the number of RBM layers and the number of nodes of each RBM layer;

Step (4), fine tuning of the model: reversely adjusting the pre-training parameters by adopting a BP neural network algorithm to obtain a final DBN fault diagnosis model; wherein, the pre-training parameter refers to the initialization parameter w of each layer of RBM _ij 、a _i 、b _j ；

2. The transformer fault diagnosis method based on the deep learning model as claimed in claim 1, wherein in step (1), the ratio of the number of the training sample set to the number of the testing sample set is 9:1.

3. The transformer fault diagnosis method based on the deep learning model as claimed in claim 1, wherein in step (1), the fault types are divided into five types: iron core grounding, coil failure, bare metal overheating, screen discharge, and turn insulation damage.

4. The transformer fault diagnosis method based on the deep learning model as claimed in claim 1, wherein in step (2), the number of RBM layers is set to be 5 and the number of nodes of each RBM layer is set to be 10.

5. The transformer fault diagnosis method based on the deep learning model as claimed in claim 1, wherein in the step (3), the specific method of model pre-training is as follows:

The forward gradient is calculated as follows:

σ represents an activation function;

step (3.3): calculating P (v) using equation (8) based on the hidden layer vector h _i = 1|h), denoted v _i1 ；

w′ _ij ＝w _ij +prob(<v _i h>-<v _i1 h ₁ >) (10)

a′ _i ＝a _i +μ|v _i -v _i1 | (11)

b′ _j ＝b _j +μ|h _j -h _j-1 | (12)

prob represents a random number of 0 or 1;

step (3.6): repeating the steps (3.2) to (3.5) to calculate a new training sample v _i2 ，v _i3 ，…，v _iT And a new hidden layer vector h ₂ ，h ₃ ，…，h _T Wherein T is the current training times, and then calculating the reconstruction error r = | | | v _iT -v _i L. When T reaches the maximum training time T _max Or R is less than the reconstruction error threshold value R, stopping training.

6. The transformer fault diagnosis method based on the deep learning model according to claim 5, wherein the learning rate μ is 98%; selecting 450 times of training times; the allowable reconstruction error threshold R is taken to be 0.98.

7. The transformer fault diagnosis method based on the deep learning model as claimed in claim 1, wherein in the step (4), the method for adjusting the pre-training parameters in the reverse direction is as follows: and calculating the average value of the calculation numbers of the parameters obtained by the reverse calculation and the initialization parameters.

8. The transformer fault diagnosis method based on the deep learning model according to claim 1, wherein in the step (4), after the final DBN fault diagnosis model is established, the method further comprises a step of testing the performance of the transformer fault diagnosis model, specifically: and (4) introducing the test sample set into the fault diagnosis model obtained in the step (4), and performing fault prediction and actual working condition comparison so as to evaluate the performance of the fault diagnosis model.