Summary of the invention
The purpose of the invention is to devise different neural network models, fault diagnosis is carried out to transformer.For
The diagnosis of transformer has used common multiple perceptron model using dissolved gas analyzing method, to being dissolved in transformer oil
Gas density modeled, construct a kind of Diagnosis Method of Transformer Faults based on GoogleNet model.
The technical solution adopted by the present invention is that:
The factor of device fails is obtained first, considers that data equipment fault can have an impact, and is determined and is wanted
The data and feature space of acquisition;It determines the fault type that equipment can occur, forms state space;Transformer state is monitored, it is right
Transformer carries out data acquisition, obtains the feature and state of transformer;It is modeled using neural network, uses collected number
According to progress model training;Fault diagnosis is carried out according to the feature of equipment with the model after training.Referring to the model structure of GoogleNet
MyNet model is built, the key of GoogleNet is Inception module, and the Inception module that this patent uses is subsequent version
This, i.e., 5 × 5 convolution are replaced using 23 × 3 convolution.Wherein, there is volume 1 × 1 of compressed data in Inception module
Product may result in information loss, to affect performance if prematurely using Inception module in a network.Cause
This, is common convolutional layer at the beginning of network.The aggregate structure figure of network (maximum convolution nuclear volume is 1024).In convolutional layer
In, k indicates convolution kernel size, and s indicates step-length, and f indicates convolution nuclear volume;In the layer of pond, k indicates Chi Huahe size, and s is indicated
Step-length.Batch normalization is not expressly shown in table, and default can all have batch after each layer of convolutional layer
normalization。
Wherein fault diagnosis uses dissolved gas analysis (DGA), and it is close to be dissolved in various gases in transformer oil using analysis
Degree, diagnoses the health status and fault type of transformer, by H2, C2H2, C2H4, C2H6, CH4, CO for detecting various concentration
Equal gases, capture out the fault messages such as shelf depreciation, low energy electric discharge, high-energy discharge, cryogenic overheating, hyperthermia and superheating.
Fault diagnosis carries out fault detection, the density and event of input transformer associated gas to transformer using neural network
Hindering type, training neural network carries out fault diagnosis to training set and verifying collection after having trained, i.e. input associated gas density,
Model exports fault type, finally calculates accuracy rate.The specifically used multi-layer perception (MLP) connected entirely, selects different hidden layers
The number of plies and neuronal quantity, the number of plies of hidden layer are 1 layer or 2 layers, and the neuronal quantity of each hidden layer can be 3,6
Or 12, a total of 6 kinds of combinations.
Neural network structure is the multi-layer perception (MLP) of full articulamentum.Input layer has 6 neurons, and layers 1 and 2 has
12 neurons, having a Loss Rate after the 2nd layer is 0.3 dropout, and output layer is a softmax classifier, there is 4
A neuron, it is 7 × e that hyper parameter, which is respectively as follows: learning rate,-2, the attenuation rate of learning rate is 0.95, the 20 training set study of every traversal
Rate decaying is primary, and batchsize 32 is traversed training set 1000 times, and weights initialisation uses xavier algorithm.
Fault diagnosis is carried out to transformer based on GoogleNet, it is more more acurrate than non-code ratio method and more flexible.Nothing
Coding rate method can only diagnose specific fault type, and neural network is then limited without this, as long as in data set
The fault type for including can diagnose.
It include input layer, hidden layer, output layer in multi-layer perception (MLP), wherein typically entering layer and output layer respectively has one layer,
And then there is no limit can be one layer, or multilayer to the number of plies of hidden layer.Every layer can have multiple sections for being known as neuron
Point, the output that one layer of these layers or more is as input, by the input exported as next layer of this layer.Common multi-layer perception (MLP)
Operation be linear multiplication before this, add up summation, then passes through nonlinear activation primitive.Illustrate herein, runic is used in this patent
The variable of expression is vector or matrix.Assuming that inputting in certain layer and activating letter for x=(x1, x2 ..., xn), parameter w
Number is g, the then output of each neuron are as follows: yi=g (∑jwijxj+bi) entire layer output are as follows: y=g (wxT+ b) it is this each
Neuron has the layer of a corresponding parameter to be called to do full articulamentum, and full articulamentum has the shortcomings that serious is exactly number of parameters
Too much.The neuronal quantity m that the number of parameters of full articulamentum is upper one layer is multiplied by the neuronal quantity n, i.e. m × n of this layer.Assuming that
The input of network is 1000 × 1000 RGB image, and first layer has 1000 neurons, then the number of parameters of this layer be 3 ×
10003, occupy very more resources.
Specific embodiment
To be easy to understand the technical means, the creative features, the aims and the efficiencies achieved by the present invention, below with reference to
Specific embodiment, the present invention is further explained.
A kind of Diagnosis Method of Transformer Faults specific steps process of the model based on GoogleNet is as shown in Figure 1.
Fault diagnosis is carried out to equipment using neural network, rough flow includes the following steps:
A. the factor of device fails is obtained first, considers that data equipment fault can have an impact, and is determined
The data and feature space to be acquired;
B. it determines the fault type that equipment can occur, forms state space;
C. transformer state is monitored, data acquisition is carried out to transformer, obtains the feature and state of transformer;Use nerve
Network is modeled, and carries out model training using collected data;
D. fault diagnosis is carried out according to the feature of equipment with the model after training.
Wherein, the fault diagnosis uses dissolved gas analysis, is dissolved in various gases in transformer oil using analysis
Density diagnoses the health status and fault type of transformer, by the H for detecting various concentration2、C2H2、C2H4、C2H6、CH4, CO etc.
Gas captures out the fault messages such as shelf depreciation, low energy electric discharge, high-energy discharge, cryogenic overheating, hyperthermia and superheating.
Moreover, the fault diagnosis carries out fault detection, input transformer correlation gas to transformer using neural network
The density and fault type of body, training neural network carry out fault diagnosis to training set and verifying collection after having trained, i.e. input phase
Gas density is closed, model exports fault type, finally calculates accuracy rate;The specifically used multi-layer perception (MLP) connected entirely, selection is not
The number of plies and neuronal quantity of same hidden layer, the number of plies of hidden layer are 1 layer or 2 layers, the neuronal quantity of each hidden layer
It can be 3,6 or 12, a total of 6 kinds of combinations.
In addition, the neural network structure is the multi-layer perception (MLP) of full articulamentum;Input layer has 6 neurons, and the 1st layer
There are 12 neurons with the 2nd layer, having a Loss Rate after the 2nd layer is 0.3 dropout, and output layer is a softmax
Classifier has 4 neurons, and it is 7 × e that hyper parameter, which is respectively as follows: learning rate,-2, the attenuation rate of learning rate is 0.95, every traversal 20 times
The decaying of training set learning rate is primary, and batch size is 32, traverses training set 1000 times, weights initialisation is calculated using xavier
Method.
Described carries out fault diagnosis to transformer based on GoogleNet, more more acurrate than non-code ratio method and cleverer
It is living;Non-code ratio method can only diagnose specific fault type, and neural network is then limited without this, as long as data
The fault type that concentration includes can diagnose.
It is softmax classifier that the use neural network, which carries out fault diagnosis classifier, is exported various types of general
Rate:
Using the type of maximum probability as the type of model prediction;Neural network finally has a loss function, is used to
The error measuring the output of model and really exporting;
The loss function of softmax are as follows:The process of model training is exactly to reduce this error, is used
Gradient descent method finds out loss function to the value of the derivative of parametersThen parameters subtract this differential and are multiplied by one
The product of a coefficient,Wherein α is referred to as learning rate.
It include input layer, hidden layer, output layer in the multi-layer perception (MLP), wherein typically entering layer and output layer respectively has
One layer, and then there is no limit can be one layer, or multilayer to the number of plies of hidden layer;
Every layer can have multiple nodes for being known as neuron, and the output that one layer of these layers or more is as input, by this layer
Export the input as next layer;The operation of common multi-layer perception (MLP) is linear multiplication, cumulative summation before this, then is passed through non-thread
The activation primitive of property;
Assuming that in certain layer, input as x=(x1, x2 ..., xn), parameter w, activation primitive g, then each neuron
Output are as follows: yi=g (∑jwijxj+bi) entire layer output are as follows: y=g (wxT+ b) this each neuron has a correspondence
The layer of parameter be called and do full articulamentum, full articulamentum has the shortcomings that serious is exactly that number of parameters is too many;The ginseng of full articulamentum
The neuronal quantity m that number quantity is upper one layer is multiplied by the neuronal quantity n, i.e. m × n of this layer;
Assuming that the input of network is 1000 × 1000 RGB image, first layer has 1000 neurons, then the parameter of this layer
Quantity is 3 × 10003, occupy very more resources.
The GoogleNet is the convolutional neural networks model that the team of Google proposed in 2014, and GoogleNet exists
Network layer is deeper, and performance is higher simultaneously, also maintains efficient computational efficiency;GoogleNet mono- shares 22 layers, without complete
Articulamentum, only 5,000,000 parameters, 1/12 of the AlexNet before being;GoogleNet can obtain simultaneously high-performance and efficiently
The key of rate is its Inception module;
Inception module is good network topology structure, is the network in network;GoogleNet passes through stacking
Inception module and constitute whole network model;In convolutional Neural operation, the size of convolution kernel is a super ginseng
Number, can be 3 × 3,5 × 5,7 × 7, different sizes may obtain different effect and performance, and hyper parameter is always by artificial
To debug and determine;The main thought for proposing Inception module is that neural network oneself is allowed to determine, passes through data set
Training allows these hyper parameters of e-learning;
Method is, in certain layer network, while carrying out 1 × 1,3 × 3,5 × 5 convolution sum maximum ponds, then lateral stacking
These operations as a result, output as the layer network;But will cause number of parameters explosion in this way, huge memory is occupied,
Computational efficiency is low, and solution is the length for reducing the third dimension of data in the convolution by 1 × 1, and the third dimension is in image
Field is also referred to as channel (channel);The quantity of convolution kernel determines the port number of convolution algorithm output, as long as the number of convolution kernel
Amount can play the effect of compressed data, to reduce the quantity of parameter less than the port number of input.
The health status for being diagnosed to be equipment and failure classes are sought to using the final result that neural network carries out fault diagnosis
Type, the classifier that this patent uses are softmax classifiers, export various types of probability:
Using the type of maximum probability as the type of model prediction.Neural network finally has a loss function, is used to
The error measuring the output of model and really exporting.The loss function of softmax are as follows:
The process of model training is exactly to reduce this error, this process, which is called to do, to be optimized.The optimization side in neural network
Method is gradient descent method, that is, finds out loss function to the value of the derivative of parametersThen parameters subtract this differential
It is multiplied by the product of a coefficient,α is referred to as learning rate.
Neural network seeks the value of the differential of parameters using the method for backpropagation.The training process of neural network is
First from low layer toward the high-rise output for calculating each layer and the value of loss function, then according to the chain rule differentiated, from high level
The value differentiated toward low layer.Classified with neural network to equipment fault, be broadly divided into two classes: a kind of directly training multilayer
Disaggregated model (supervised learning);Another kind of is first to carry out one single layer classifier of feature extraction (self-supervisory study) retraining (prison
Educational inspector practises).Both methods is introduced separately below.
Directly training multistratum classification model, directly exercise supervision study, with data and label training one with classifier
Feedforward neural network, this patent carry out the judgement of transmission line of electricity internal fault external fault and failure using voltage and current data and respective labels
Phase selection.Two kinds of unused models are used: the multi-layer perception (MLP) and convolutional neural networks connected entirely.Both moulds are described below
Type.
Multi-layer perception (MLP) (MLP) is also feedforward neural network, and target is to train a function f (x), is allowed to as far as possible
Ground is close to true function model.For example, one sorter model of training, is mapped to classification c for f (x).The mould of multi-layer perception (MLP)
Type be it is fixed, training be parameter
It include input layer, hidden layer, output layer in multi-layer perception (MLP), wherein typically entering layer and output layer respectively has one layer,
And then there is no limit can be one layer, or multilayer to the number of plies of hidden layer.Every layer can have multiple sections for being known as neuron
Point, the output that one layer of these layers or more is as input, by the input exported as next layer of this layer.
The operation of common multi-layer perception (MLP) is linear multiplication, cumulative summation before this, then passes through nonlinear activation primitive.
Illustrate herein, with the variable that runic indicates is vector or matrix in this patent.Assuming that in certain layer, input for x=(x1,
X2 ..., xn), parameter w, activation primitive g, then the output of each neuron are as follows:
yi=g (∑jwijxj+bi) (3)
The output of entire layer are as follows:
Y=g (wxT+b) (4)
This each neuron has the layer of a corresponding parameter to be called to do full articulamentum, and full articulamentum has serious
Disadvantage is exactly that number of parameters is too many.The neuronal quantity m that the number of parameters of full articulamentum is upper one layer is multiplied by the neuron of this layer
Quantity n, i.e. m × n.Assuming that the input of network is 1000 × 1000 RGB image, first layer has 1000 neurons, then this layer
Number of parameters be 3 × 10003, occupy very more resources.Therefore, there is convolutional neural networks (LeCun, 1989).
Convolutional neural networks (CNN) are the data neural networks that spatially there is certain structure for handling.Such as scheme
As data (there are two-dimensional structures).Convolution is also a kind of linear multiplication and cumulative operation, but and the multilayer sense said above
Know that the linear operation of machine has without place.Convolutional network refers to that those are at least substituted in the one of network layer using convolution algorithm
The neural network of normal linear operation.
The parameter of convolutional neural networks is called convolution kernel.Convolution algorithm can be one-dimensional, two-dimentional or three-dimensional.Two-dimensional convolution
Input, output, convolution kernel be all two-dimensional.The process of convolution algorithm is exactly that convolution kernel slides on input matrix, convolution kernel
On each parameter be multiplied and sum with the input of corresponding position, obtain the output of a position, every sliding is primary, just calculates
One output, finally obtains complete output, as shown in Figure 4.Assuming that the dimension of convolution is k × k, then export are as follows:
Convolution kernel can have multiple, and the length of the third dimension of convolution algorithm result is equal to the quantity of convolution kernel.Convolutional Neural
Network improves the efficiency of neural network by partially connected, the two important features of parameter sharing, reduces resource consumption.Front
It mentioned, if a certain layer in network has a input and b output, full articulamentum needs a × b parameter and algorithm
Time complexity be O (a × b).In convolutional neural networks, if it is c that we, which limit the connection number that each output possesses,
So sparse connection method only needs the operation time of c × b and O (c × b).In many practical applications, only need to keep
The small several orders of magnitude of c ratio b, the performance that can have been obtained.Parameter sharing refers to that the parameter of different neurons is in neural network
It is identical.The parameter sharing of convolution algorithm can make our only one parameter sets of training, and not have to for each neuron
All with one independent parameter sets of study.Although not reducing the time complexity of algorithm in this way, but can be parameter
Quantity is reduced to c, and far smaller than a × b and c × n significantly reduce storage demand[11].Other than convolution, convolutional Neural
There are also a kind of very important operations for network: Chi Hua.There are maximum pond and average pond in pond.Maximum pondization is exactly in a phase
It is maximized in neighbouring region, average pondization is averaged in adjacent area.As convolution, the window in pond is being inputted
It is slided on matrix, every sliding once just calculates the value of a neuron.Pondization often connects behind convolution, it is possible to reduce data
Dimension.For example, when the size of Chi Huahe is 2 × 2, when pond step-length is 2, the height and width of data all reduced 1/2, data number
Amount is reduced to 1/4.Mono- important role of Chi Huayou is to maintain the invariance of input.For example, in the region Hua Qu of maximum pond
Maximum value, after other values except maximum value change, the output in pond is remained unchanged.Translation by a small margin will not make pond
The output of change changes.When we only certain features whether occur and do not mind it occur position when, pond it is this
Local invariant has effect the performance boost of network with regard to highly useful.Image procossing used at present is mostly convolution mind
Through network, convolutional neural networks development is very fast, and complicated and high performance network model is constantly suggested.AlexNet
(2012), VGGNet (2014), GoogleNet (2014), ResNet (2015) etc. are very high performance network models.?
In the experiment of transformer fault diagnosis described in this patent, the network structure of similar GoogleNet and ResNet has been used, wherein
GoogleNet principle is as described below.
GoogleNet is the convolutional neural networks model that the team of Google proposed in 2014, big in current year ImageNet
In scale visual identity challenge (ILSVRC), only 6.7% error rate achieves champion.GoogleNet network layer more
Deep, performance is higher simultaneously, also maintains efficient computational efficiency.GoogleNet mono- shares 22 layers, without full articulamentum, only
There are 5,000,000 parameters, 1/12 of the AlexNet before being.GoogleNet can obtain high-performance simultaneously and efficient key is
Its Inception module.Inception module is good network topology structure, is the network in network.GoogleNet
Whole network model is constituted by stacking Inception module.In convolutional Neural operation, the size of convolution kernel is one
A hyper parameter can be 3 × 3,5 × 5,7 × 7, and different sizes may obtain different effect and performance.Inception
The main thought of module is allowed manually with it to determine convolution kernel size, is decided when with pond, not as good as allowing network oneself
It determines, allows e-learning these hyper parameters by the training of data set.Method is, in certain layer network, while progress 1 ×
1,3 × 3,5 × 5 convolution sum maximum pond, then these operations of lateral stacking as a result, output as the layer network.But
It will cause number of parameters explosion in this way, occupy huge memory, computational efficiency is low, and solution is to pass through 1 × 1 convolution
Come reduce data the third dimension length, the third dimension is also referred to as channel (channel) in image domains.The quantity of convolution kernel determines
The port number of convolution algorithm output, as long as the quantity of convolution kernel can play the effect of compressed data less than the port number of input
Fruit, to reduce the quantity of parameter.
This patent carries out fault detection, the density and failure of input transformer associated gas to transformer using neural network
Type, training neural network carry out fault diagnosis to training set and verifying collection after having trained, i.e. input associated gas density, mould
Type exports fault type, finally calculates accuracy rate.The specifically used multi-layer perception (MLP) connected entirely, selects the layer of different hidden layers
Several and neuronal quantity, the number of plies of hidden layer are 1 layer or 2 layers, the neuronal quantity of each hidden layer can for 3,6 or
Person 12, a total of 6 kinds of combinations.
As a comparison, experiment also uses the non-code ratio method that Du's sample proposes to examine transformer fault type
It is disconnected.Non-code ratio method is according to C in transformer oil2H2/C2H4、C2H4/C2H6、CH4/H2Value judge fault type, specifically
Diagnostic method is as shown in table 1.
Table 1 is without coding rate method for diagnosing faults
For the data set that this experiment uses from document [20], one shares 200 datas.Every data contains H2、
C2H2、C2H4、C2H6、CH4, CO this 6 kinds of gases density and fault type.Fault type has high-energy discharge, low energy electric discharge, heat
4 kinds of failure, fault-free, being separately encoded is 0~3.
Because data set quantity is very little, using cross validation.Data set is equally divided into 4 parts, every part of 50 datas,
Quantity of the data of every kind of fault type in each part is equal.Collect a copy of it as verifying, remaining is as training set, circulation
4 times, cross validation.
Assuming that the mean value of each variable of data is μ, standard deviation σ, the then formula of data prediction are as follows:
X=(x- μ)/σ (6)
The neural network structure that this patent uses is the multi-layer perception (MLP) of only full articulamentum, and one shares 6 kinds of combinations.Now
One of which is lifted to illustrate.Input layer has 6 neurons, and layers 1 and 2 has 12 neurons, has one after the 2nd layer
The dropout that Loss Rate is 0.3, output layer is a softmax classifier, has 4 neurons, as shown in table 2.
2 neural network structure of table
It is 7e-2 that hyper parameter, which is respectively as follows: learning rate, and the attenuation rate of learning rate is 0.95,20 training set learning rates of every traversal
Decaying is primary, and batch size is 32, is traversed training set 1000 times, and weights initialisation uses xavier algorithm.
The neural network one that transformation diagnostic test uses shares 6 kinds of combinations, every kind combined training 4 times, training is accuracy rate
As shown in table 3, verifying collection accuracy rate is as shown in table 4.
3 training set accuracy rate of table
The verifying collection accuracy rate of table 4
Interpretation of result shows in 6 kinds of combinations that only hiding layer number is 1 and every layer of neuronal quantity is 3 this combinations
Accuracy rate it is lower, the accuracy rate average out to 86.35% of training set, verify collection accuracy rate be 81.50%, remaining combined training
Collecting accuracy rate is 97.84% -98.17%, and verifying collection accuracy rate is 96.00% -97.50%.When the neuron of hidden layer is total
For number at 6 or more, neural network model can obtain good effect, and if continuing to increase network model, accuracy rate is not
Can improve, be held essentially constant, the little discrimination of experimental result be regarded as due to data set it is small caused by error.Without coding
Ratio method cannot diagnose trouble-free situation, so only being used to diagnose the fault type of 100 data sets, accuracy rate is
88.00%.The Comparative result of two methods is as shown in table 5.
5 neural network of table and non-code ratio method Comparative result
Experiment shows to carry out fault diagnosis, Er Qiegeng more more acurrate than non-code ratio method to transformer using neural network
Flexibly.Non-code ratio method can only diagnose specific fault type, and neural network is then limited without this, as long as number
According to the fault type that concentration includes, can diagnose.Therefore a kind of transformer of model based on GoogleNet of this patent proposition
Method for diagnosing faults is effective.
The above description is only an embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.