CN110866604A

CN110866604A - Cleaning method for power transformer state monitoring data

Info

Publication number: CN110866604A
Application number: CN201911032677.2A
Authority: CN
Inventors: 高树国; 夏彦卫; 李刚; 刘云鹏; 张博; 许自强; 臧谦; 赵军; 刘宏亮
Original assignee: Electric Power Research Institute of State Grid Hebei Electric Power Co Ltd; North China Electric Power University
Current assignee: Electric Power Research Institute of State Grid Hebei Electric Power Co Ltd; North China Electric Power University
Priority date: 2019-10-28
Filing date: 2019-10-28
Publication date: 2020-03-06

Abstract

A method for cleaning state monitoring data of a power transformer includes the steps of establishing stack noise reduction self-encoders for monitoring data of gas in oil of an oil-immersed power transformer, conducting stacking and fine-tuning processing after each self-encoder in the stack noise reduction self-encoders is trained one by one to obtain a final stack noise reduction self-encoder data cleaning model, inputting monitoring data of the gas in the oil of the oil-immersed power transformer into the stack noise reduction self-encoder data cleaning model for processing, and achieving the purpose of noise reduction of original data. Compared with the traditional data cleaning mode based on a statistical method, the method starts from the essential characteristics of data, gets rid of the constraint of data indexing judgment standards, can better eliminate data noise, and provides reliable monitoring data for the work of equipment state evaluation, equipment service life prediction and the like.

Description

Cleaning method for power transformer state monitoring data

Technical Field

The invention relates to a cleaning method of power transformer state monitoring data, which can eliminate data noise and provide reliable monitoring data for work such as equipment state evaluation, equipment service life prediction and the like, and belongs to the technical field of transformers.

Background

The power transformer is one of the core components in the power system, and a plurality of power transformer state evaluation methods and fault diagnosis methods are developed to ensure efficient, durable and stable operation of the power transformer. Meanwhile, with the popularization of the application of various sensor technologies nowadays, the data volume and the data dimension of monitoring data which can be collected by a power transformer are both increased explosively, and the heterogeneous data characteristics of multiple sources are gradually shown.

Due to the reasons that the power transformer has different operation conditions and the communication line has loss and the like, the state data acquired from the far-end transformer is often mixed with information such as a lot of noise and the like, so that certain influence is brought to subsequent equipment state evaluation work, evaluation errors are increased, and the normal work of a power system is influenced. Therefore, data cleaning for the power transformer state monitoring data is very important.

For the data cleaning problem, researchers at home and abroad mainly carry out research work on aspects of data abnormal value monitoring, missing data value repairing, data noise removing and the like. Because the specific principle of data cleaning has strong pertinence, a corresponding data cleaning strategy needs to be established by combining a specific application scene and a data acquisition object, and the data cleaning method for the power transformer state monitoring data mainly comprises two types:

one is a data cleaning method based on a statistical theory, and the method has a relatively effective preprocessing effect on single or low-dimensional data; another class is cleaning methods based on data mining, for example, data cleaning methods based on time series analysis; the machine learning deep learning based research method can perform noise reduction processing according with data characteristics on data starting from the essential characteristics of the data.

In the data cleaning process, since the same data can be analyzed for different data use cases, if only the business rules of one use case are used for converting the original data set, the use of another use case may be negatively affected.

Meanwhile, in the conventional data quality research mode, data dimensions are mapped to a given data quality evaluation frame, so that investigation and research need to be performed on a plurality of data such as transformer state parameters, which dimensions of data are more important are judged, and a corresponding data quality evaluation frame is established.

At present, many research achievements related to data cleaning exist at home and abroad, and people such as the Mongolian building and the like propose an optimal intra-group variance cleaning algorithm for wind speed power data of a wind turbine generator. In the process of cleaning abnormal data of equipment fault information, by virtue of a data cleaning method based on time sequence analysis, the rigor et al uses a double-loop iterative inspection method to clean temperature data of a lead and gas CH in oil₄Monitor (A)The method achieves the purposes of correcting data noise points and filling data vacancy values, and obtains better experimental effect, but a larger error exists near the abnormal value occurrence time of the time sequence cleaned by the method and the original time sequence. Similarly, by adopting a time series autoregressive model of single state quantity data to process the equipment state quantity, and adopting a self-organizing neural network to quantize the time series, the equipment multi-state quantity data is mined, compared with a method for judging whether the equipment state is abnormal or not based on an equipment state quantity threshold value, the method simplifies the complex correlation relation of equipment state quantity multi-dimensional parameters, but the processing result is easily influenced by the external environment abnormality. At the present stage, however, data cleaning models for daily monitoring data of electrical equipment are rarely studied, and researchers pay more attention to help from the acquired state monitoring data for subsequent work such as equipment state evaluation and equipment life prediction, and research on cleaning and correcting methods for the state monitoring data of the electrical equipment is lacked.

Disclosure of Invention

The invention aims to provide a cleaning method of power transformer state monitoring data aiming at the defects of the prior art, so as to eliminate data noise and provide reliable monitoring data for the work of equipment state evaluation, equipment service life prediction and the like.

The problems of the invention are solved by the following technical scheme:

a method for cleaning state monitoring data of a power transformer includes the steps of establishing stack noise reduction self-encoders for monitoring data of gas in oil of an oil-immersed power transformer, conducting stacking and fine-tuning processing after each self-encoder in the stack noise reduction self-encoders is trained one by one to obtain a final stack noise reduction self-encoder data cleaning model, inputting monitoring data of the gas in the oil of the oil-immersed power transformer into the stack noise reduction self-encoder data cleaning model for processing, and achieving the purpose of noise reduction of original data.

According to the cleaning method of the power transformer state monitoring data, the establishment and training method of the stack noise reduction self-encoder data cleaning model comprises the following steps:

a. training the outer first self-encoder of making an uproar of falling, the structure is: the system comprises an input layer, a hidden layer 1 and an output layer, wherein the first layer of parameters are trained and calculated according to a reconstruction error between the output layer and the input layer;

b. and b, constructing a structure by using the hidden layer 1 neuron structure obtained in the step a as follows: hidden layer 1, hidden layer 2 and hidden layer 3, training and calculating second layer parameters according to reconstruction errors between the hidden layer 3 and the hidden layer 1;

c. training all the noise reduction self-encoders in the stack noise reduction self-encoder step by step in sequence by analogy;

d. respectively substituting the weights and the offset vectors of different hidden layers into corresponding positions of the whole neural network to obtain a complete stack noise reduction self-encoder data cleaning model;

e. fine adjustment is carried out on parameters in a data cleaning model of the stack denoising self-encoder:

firstly, noise is added to each weight and bias in the network, and the symmetry of parameters formed during layer-by-layer pre-training is broken; and secondly, training the whole neural network by adopting different cross entropy from each layer of noise reduction self-encoder according to a gradient descent method to obtain a trained data cleaning model of the stack noise reduction self-encoder.

In the cleaning method for the power transformer state monitoring data, the establishment and training of the single noise reduction self-encoder in the stack noise reduction self-encoder data cleaning model comprises the following steps:

① assuming the original data to be input as x, processing by random inactivation (dropout), i.e.

r⁽ⁱ⁾～Bernoulli(p)

In the formula, r⁽ⁱ⁾Bernoulli probability distribution with p as probability, x⁽ⁱ⁾For i certain neuron node of the input data,

is a neuron node after random inactivation treatment;

② noisy input data

Inputting the data into a neural network for encoding, entering an intermediate hidden layer, and expressing a calculation formula of the data in the following way

In the formula (I), the compound is shown in the specification,

for the coded output matrix, W⁽ⁱ⁾The weight matrix of the current hidden layer is m × n orders, wherein n is the dimension of the current input layer, and m is the dimension of the next hidden layer; b⁽ⁱ⁾A bias vector of order m; s is a neural network activation function;

③ the encoded data enters the hidden layer of the noise reduction self-encoder, then the data of the hidden layer is reconstructed, the reconstruction function is as follows

In the formula (I), the compound is shown in the specification,

in order to reconstruct the output,

weight matrix W 'being the output of step ②, i.e., the data in the hidden layer'⁽ⁱ⁾Is a weight matrix W⁽ⁱ⁾Is of order n x m, m being the current hidden layer dimension, n being the next reconstructed output layer dimension, b'⁽ⁱ⁾A bias vector of order n;

④ calculating a reconstructed output

And input

Mean square error between

In the formula (I), the compound is shown in the specification,

is the mean square error value, y_iIn order to be able to input the data as raw,

fitting data output for reconstruction; w is a_iIs a value greater than 0, SSE is a sum variance, and the calculation formula is

⑤ mean square error value obtained according to step ④

Carrying out supervised fine tuning on network parameters, wherein the formula is as follows

In the formula (I), the compound is shown in the specification,

the weights after n times of inverse computation residual update iterations are passed for the i-th layer neurons,

bias after n iterations of inverse residual update for layer i neurons, η neural networkM is the number of samples participated in the current training round number.

In the cleaning method for the state monitoring data of the power transformer, the stack noise reduction self-encoder adopts the cross entropy as a loss function of the whole input and output, and the formula is as follows

Wherein p (x) is the expected output of data, q (x) is the output of actual operation, and H (p, q) is the overall cross entropy structure;

the loss function adopted by each layer of the stack denoise autoencoder is a ReLU function.

Compared with the traditional data cleaning mode based on a statistical method, the method starts from the essential characteristics of data, gets rid of the constraint of data indexing judgment standards, can better eliminate data noise, and provides reliable monitoring data for the work of equipment state evaluation, equipment service life prediction and the like.

Drawings

The present invention will be described in further detail with reference to the accompanying drawings.

FIG. 1 is a two-dimensional data fitting graph and data cleaning principle;

FIG. 2 is a block diagram of an overall model structure of a stacked denoising autoencoder;

FIG. 3 is a process of internal network training for a stacked denoising autoencoder;

FIG. 4 is a three-layer self-coding neural network;

FIG. 5 is a stacked self-encoded structure;

FIG. 6 is a self-encoder training principle;

FIG. 7 is a diagram of the stack self-encoder training principle.

The notation used herein: x is the original data to be input, r⁽ⁱ⁾Bernoulli probability distribution with p as probability, x⁽ⁱ⁾For i certain neuron node of the input data,

is a neuron node after random inactivation treatment;

for the coded output matrix, W⁽ⁱ⁾For the current hidden layer weight matrix, b⁽ⁱ⁾A bias vector of order m; s is a neural network activation function;

is reconstructed output, W'⁽ⁱ⁾Is a weight matrix W⁽ⁱ⁾Transpose of, b'⁽ⁱ⁾A bias vector of order n;

fitting data output for reconstruction; w is a_iIs a value greater than 0, SSE is a sum variance,

for the bias after n iterations of inverse computation residual update for the i-th layer neurons, η is the learning rate set in the neural network, p (x) is the expected output of data, q (x) is the output of actual operation, and H (p, q) is the overall cross entropy structure.

Detailed Description

Due to the working characteristics of equipment and the daily maintenance requirement, the collected data of the power transformer state monitoring data are roughly divided into three types: text data including equipment maintenance test records, inspection defect elimination records, fault and defect description reports, event sequence records and the like; image data, such as transformer bushing, oil temperature, partial discharge, winding temperature, etc.; more often numerical data, such as gas data in power transformer oil, etc. In the method, for example, the numerical data of the power transformer is mostly time series data which fluctuates in two dimensions with time, and due to different working environments of the power transformer and the potential situations of sensor failure, communication line interference and the like, some isolated points (shown as solid white data points in fig. 1) which deviate from expected values or singular points which lose data (shown as dotted white data points in fig. 1) may occur in the data collected by background staff.

The stack noise reduction self-encoder obtains the distribution characteristics corresponding to normal data by learning the nonlinear characteristics of normal data points in the original data; for data points with Gaussian noise randomly discarded or added during input, the stack noise reduction self-encoder predicts the position of the damaged point possibly in the sample data stream on the premise of meeting the overall sample characteristics by learning the data characteristics and expectation obtained by other data points which are not forcibly damaged; meanwhile, for the data missing points and the abnormal points originally existing in the data, the above mode can be adopted, the missing points and the abnormal point samples are fit to the characteristic positions meeting the overall distribution of the samples again according to the overall implicit data distribution characteristics of the samples, so that the reconstruction errors of the samples are reduced gradually, and the purpose of cleaning the data is achieved.

Aiming at the data classification and the data characteristics of the power transformer state monitoring data, the invention establishes a data cleaning model based on the stack noise reduction self-encoder for the gas information in the oil of the oil-immersed power transformer, and the specific structure of the model is shown in figure 2.

The following first explains the establishment and training of a training model for a single noise-reducing self-encoder.

Step 1: assuming that the original data to be input is x, processing by random inactivation (dropout), namely

r⁽ⁱ⁾～Bernoulli(p) (3-1)

the neuron nodes are treated by random inactivation.

Step 2: input data with added noise

In the formula, W⁽ⁱ⁾The weight matrix of the current hidden layer is m × n orders, wherein n is the dimension of the current input layer, and m is the dimension of the next hidden layer; b⁽ⁱ⁾A bias vector of order m; s is the activation function of the neural network, and usually, the activation function has the following types

The Sigmod function is characterized in that the output value is in the (0,1) interval, and the mathematical expression is as follows

Tanh function, characterized by using origin as center, and output value between [ -1,1], and mathematical expression as follows

The output of the ReLU function is zero or positive, the calculation rate is faster than the former two because exponential operation is not involved, and the mathematical expression is as follows

f(z)＝max(0,z) (3-6)

Because the experimental data related to the method are all nonnegative discrete values, the gradient saturation problem can be reduced and the model training speed can be accelerated by adopting the ReLU function.

And step 3: the coded data enters the hidden layer of the noise reduction self-coder, and then the data of the hidden layer is reconstructed, wherein the reconstruction function is as follows

In the formula (I), the compound is shown in the specification,

the output in the step two, namely the data in the hidden layer; weight matrix W'⁽ⁱ⁾Is a weight matrix W⁽ⁱ⁾Is of order n x m, m being the current hidden layer dimension, n being the next reconstructed output layer dimension, b'⁽ⁱ⁾For the bias vector of n order, the activation function is similar to the hidden layer coding stage, and a ReLU activation function is adopted.

And 4, step 4: computing reconstructed outputs

And input

Because the experimental data of the invention adopts the gas data in the oil of a certain power transformer, and the data is discrete numerical data, the reconstruction error, namely the loss function, is calculated by adopting a mean square error mode, and the formula is as follows:

in the formula, y_iIn order to be able to input the data as raw,

fitting data output for reconstruction; w is a_iThe value is greater than 0 so as to more intuitively reflect the degree that the mean square error tends to be 0 and be used for reflecting the reduction degree of the reconstructed data to the original input; SSE is the sum and variance, and the formula is not described much here below.

And 5: and (4) carrying out supervised fine adjustment work on the network parameters according to the mean square error value obtained in the step (4), adjusting the network parameters by a gradient descent method to further reduce the loss function so as to achieve the purpose of reconstructing the data and restoring the input data as much as possible, wherein the formula is as follows

In the formula (I), the compound is shown in the specification,

bias after n iterations of inverse computation residual update for the i-th layer of neurons pass, η is the learning rate set in the neural network, and m is the number of samples involved in the current training round.

In order to ensure that output and input data are restored to input data as far as possible on the premise that the input and output data are repaired and cleaned by a neural network while noise is added, the invention trains each self-encoder one by one and then stacks the self-encoders instead of establishing a complete stacked self-encoder network to train the whole network, and the specific training thought of the whole stacked noise reduction self-encoder neural network is shown in figure 3.

Step 1: training the outer first self-encoder of making an uproar of falling, the structure is: the input layer, the hidden layer 1 and the output layer train and calculate the first layer parameters according to the reconstruction error between the output layer and the input layer, and bring the obtained weight and the offset vector into the corresponding position of the whole stack network.

Step 2: the hidden layer 1 neuron structure obtained in the first step is constructed by the following steps: and the hidden layer 1, the hidden layer 2 and the hidden layer 3 are noise reduction self-encoders, and parameters of a second layer are trained and calculated according to the reconstruction error between the hidden layer 3 and the hidden layer 1, and the obtained parameters are brought into corresponding positions of the whole stack network.

And step 3: by analogy, all the noise reduction self-encoder units are trained step by step and are brought into corresponding positions of the stack network, and it is noted that parameters of each noise reduction self-encoder unit at the moment are independently trained.

And 4, step 4: and according to the parameters of each layer of the trained noise reduction self-encoder, respectively bringing the weights and the offsets of different hidden layers into corresponding positions of the whole neural network, and establishing a complete stack noise reduction self-encoder model.

And 5: and fine-tuning the parameters in the stack noise reduction self-encoder. Firstly, a small amount of noise is added to each weight and bias in the network, so that the neural network can be forced to break through the symmetry of parameters formed during layer-by-layer pre-training, and the condition of zero gradient is avoided as much as possible; secondly, different from the loss function adopted by each layer of noise reduction self-encoder, the final loss function of the integral input and output is adopted by the output layer of the integral neural network by adopting cross entropy, and the formula is as follows

Where p (x) is the expected output of the data, q (x) is the output of the actual operation, and H (p, q) is the overall cross entropy structure.

Then, the whole neural network is trained according to a gradient descent method, so that the loss function of the network gradually tends to the minimum value, and the control of the training effect of the whole neural network is realized by adopting a mode of outputting the loss function of the neural network. The purpose of noise reduction of original data is achieved by training the damaged input step by step and restoring the damaged input to the original output.

The method for cleaning the state monitoring data of the power transformer provided by the invention establishes a data cleaning model by taking a stack noise reduction self-encoder as a theoretical basis.

The method for cleaning the state monitoring data of the power transformer based on the stack noise reduction self-encoder is different from the traditional data cleaning mode based on a statistical method, and starts from the essential characteristics of data, gets rid of the constraint of data indexing judgment standards, can better realize the data processing work of the state monitoring data of the power transformer, and has feasibility and practical application value.

Information on related art

The noise reduction self-encoder is proposed by Vincent et al in 2008, and noise data is added to input data randomly on the basis of a traditional self-encoder, so that a network learns the data characteristics of damaged input data, and the damaged input data is approximately restored to the input data without noise. Through the self-reconstruction process, the extraction and the learning of the data features are realized. For data with larger dimensions, originally, a single noise reduction self-encoder cannot learn all data features of the data at one time, in order to learn implicit relations in the data as much as possible, a plurality of noise reduction self-encoders can be stacked to form a stacked noise reduction self-encoder (SDAE), and then the above difficulties can be better overcome.

The self-encoder belongs to an artificial neural network, and has the most prominent characteristics that the dimension of the inner layer of the self-encoder neural network is much lower than that of original data, and the dimension is the least in the middle layer of the neural network. In the three-layer self-coding neural network shown in fig. 4, the dimension of each layer gradually decreases from outside to inside.

As shown in fig. 4, the general implementation of the self-encoder can be generalized to a learning process that restores the reconstructed output data to the original input data as much as possible, i.e., the output data reproduces the input data. The network structure enables the self-encoder to obtain the high-dimensional characteristics of the original data in the middle layer on the premise of not needing any supervision (namely, the input training set does not need to be marked), and has good application effect in reducing the data dimension.

An autoencoder typically consists of two parts: an encoding portion and a decoding portion. The encoding part, which may also be referred to as the recognition part, functions to convert the original input data into encoder internal representation data; the decoding section may also be referred to as a generating section, and functions to convert the encoder internal number into output data. Self-encoders typically have an architecture similar to that of a Multi-layer perceptron (MLP), but unlike the MLP, they emphasize that the input data dimension and the output data dimension should remain the same for the purpose of output data reproduction of the input data.

As described above, in general, self-coding is a single hidden layer, and when a data set with a low dimensionality is processed, the self-coding has good feature extraction and data reproduction capabilities, but when a data set with a large data volume or a large data dimensionality is faced, the single hidden layer self-coding is difficult to achieve dimension reduction and reproduction of a large amount of complex data.

Like other neural network models, the self-encoder can also increase the number of hidden layers to form a stacked structure, each layer can implement a small amount of data dimension reduction (i.e. learn less data features), so that the overall stacked self-encoder can learn more complex encoding modes, and the structure is shown in fig. 5.

Wherein corresponding layers have the same dimensions, for example: the number of nodes of the input layer is equal to that of nodes of the output layer, and the number of nodes of the hidden layer 1 is equal to that of nodes of the hidden layer 3. The relative input hidden layer in the neural network restores the corresponding output hidden layer, so that the whole neural network has better capacity of processing complex data sets, and simultaneously, the stack self-encoder has better data reconstruction capacity and data reproduction capacity aiming at the original data sets.

If a multilayer self-coding network is established purely for reconstructing input, the network self-learning capability in the process of training the multilayer network is ignored, and in order to highlight the learning capability among all hidden layers of a deep network, Gaussian noise can be added to original data at random before the initial stage of training of the original input data, and the output data is tried to be restored to the original data form as far as possible through a training model, so that a stack noise reduction self-encoder (SDAE) is formed.

Different from the traditional self-coding, the noise reduction self-coder randomly adds noise in the input structure of the traditional self-coder, so that the original data is not 'pure', the phase change increases the randomness of the input data, the added noise can be divided into two types, one type is Gaussian noise, the other type can randomly set the zero of the input data element (namely, the input neuron), and fig. 6 compares the difference between the noise reduction self-coder and the traditional self-coder.

As shown in fig. 6, X is input layer for inputting data, Y is intermediate hidden layer located inside the neural network, Z is output layer for outputting reconstructed input data, and L is loss function, typically cross entropy loss function. The training goal is to make the input quantity X equal to the output quantity Z as much as possible, i.e. Z ≈ X, so as to achieve the purpose of data reconstruction. In the training principle of the stacked self-encoder of FIG. 7, the original input data X is randomly added with noise (i.e. black neurons in FIG. 7) to obtain damaged data

Impaired raw data by training

Obtaining output data and back-propagating the parameters through a loss function, and finally obtaining a result similar to that of a self-encoder

Claims

1. A cleaning method for state monitoring data of a power transformer is characterized in that a stack noise reduction self-encoder is established for monitoring data of gas in oil of an oil-immersed power transformer, stacking and fine-tuning processing are carried out after each self-encoder in the stack noise reduction self-encoder is trained one by one to obtain a final stack noise reduction self-encoder data cleaning model, and finally monitoring data of the gas in the oil of the oil-immersed power transformer are input into the stack noise reduction self-encoder data cleaning model to be processed, so that the purpose of noise reduction of original data is achieved.

2. The method for cleaning the state monitoring data of the power transformer as claimed in claim 1, wherein the method for establishing and training the stack noise reduction self-encoder data cleaning model comprises the following steps:

firstly, noise is added to each weight and bias in the network, and the symmetry of parameters formed during layer-by-layer pre-training is broken; and secondly, training the whole neural network by adopting a loss function different from that of each layer of the noise reduction self-encoder according to a gradient descent method to obtain a trained data cleaning model of the stack noise reduction self-encoder.

3. The method for cleaning the condition monitoring data of the power transformer as claimed in claim 2, wherein the establishment and training of the single noise reduction self-encoder in the stack noise reduction self-encoder data cleaning model comprises the following steps:

r⁽ⁱ⁾～Bernoulli(p)

is a neuron node after random inactivation treatment;

② noisy input data

In the formula (I), the compound is shown in the specification,

In the formula (I), the compound is shown in the specification,

in order to reconstruct the output,

④ calculating a reconstructed output

And input

Mean square error between:

in the formula (I), the compound is shown in the specification,

⑤ mean square error value obtained according to step ④

In the formula (I), the compound is shown in the specification,

4. The method as claimed in claim 3, wherein the stack noise reduction auto-encoder uses cross entropy as a loss function of the overall input and output, and the formula is as follows