CN113743534B

CN113743534B - Transformer oil gas composite imaging identification method based on depth residual error network

Info

Publication number: CN113743534B
Application number: CN202111095213.3A
Authority: CN
Inventors: 胡昊; 马鑫; 郑野; 尚毅梓; 张长辉; 李新凯; 张红涛; 李擎; 钟凌; 职保平
Original assignee: Yellow River Conservancy Technical Institute
Current assignee: Yellow River Conservancy Technical Institute
Priority date: 2021-09-17
Filing date: 2021-09-17
Publication date: 2022-06-07
Anticipated expiration: 2041-09-17
Also published as: CN113743534A

Abstract

The invention provides a method for identifying gas in transformer oil by composite imaging based on a deep residual error network, which comprises the following steps: firstly, combining gas concentration, sampling time and sampling point temperature into a new gas characteristic vector, and performing data enhancement and fault evaluation on the gas characteristic vector to obtain a training set and a test set; secondly, optimizing a soft threshold optimization target function in the common channel depth residual shrinkage network by using a cross direction multiplier and a FISTA algorithm to obtain a sub-channel threshold depth residual shrinkage network; and finally, training and testing the training set and the test set respectively by using a subchannel threshold depth residual shrinkage network, and performing visual dimensionality reduction on an output result by using a UMAP algorithm to obtain a fault type. The invention optimizes and improves the network, solves the problem of partial solution loss caused by sparse matrix transposition in the network by utilizing the alternative direction multiplier, and accelerates the output speed of the network by utilizing the FISTA and UMAP algorithms.

Description

Method for recognizing gas composite imaging in transformer oil based on depth residual error network

Technical Field

The invention relates to the technical field of transformer fault identification, in particular to a transformer oil gas composite imaging identification method based on a deep residual error network.

Background

Most of domestic transformers are of an oil-paper combined insulation type, and various faults can occur in daily operation, such as abnormal operation temperature, arc discharge, partial discharge, insulation reduction and the like. When internal failures occur, different types of electrical discharges or temperatures can cause the insulating oil to crack, producing a variety of alkane gases and other gases. Due to the fact that the encapsulation degree of the transformer is high, a maintainer cannot visually observe the internal condition of the transformer, different types of faults are deteriorated and different in damage degree, and casualties can be caused if maintenance measures are taken according to an accident diagnosis result with low accuracy, and therefore the fault identification of the transformer is efficient and accurate.

The general point location and the severity of an accident can be judged by analyzing the components and the concentration of fault gas, and currently, the traditional methods such as a three-ratio method, a Rogers ratio method, a Dornenburg diagnostic method, a DuvalTriangle method and the like based on a DGA method are limited in precision, and some methods are not sensitive to DGA gas data and cannot make accurate diagnosis. In order to effectively improve the accuracy of diagnosis and identification of gas faults in transformer oil, researchers have developed transformer fault identification studies by using statistical analysis methods, machine learning methods and the like. Documents [ Benhamed K, Mooman A, Younnes A, et al. feature Selection for efficient Health indexes of Power transformations [ J].IEEE Transactions onPowerDelivery,2017:1-1.]Through the combination of simulation analysis and a mold, a GRNN method is utilized on the basis of a subsystem, a plurality of key factors influencing the health state of the transformer are obtained, and the fault point position and the accident type of the transformer can be accurately judged. The literature [ G.K.Irngu, A.O.Akumu and J.L.Munda, "Angle Fault Diagnostic Technique in Oil-Filled electric Equipment; the Dual of Duval Triangle, "IEEE Transactions on Dielectrics and electric Insulation, vol.23, No.6, pp.3405-3410, Dec 2016.]Then the pairs of the Deval triangles are adoptedOccasionally, the fault condition of the oil-immersed transformer is diagnosed, but the method can cause the accident identification results to conflict with each other. Literature [ Bacha, Khmais; souahlia, Seifeddine; gossa, Moncef, "Power transducer fault diagnosis based on dispersed gas analysis by supported vector machine," Electric Power Systems Research, vol.83, No.1, pp.73-79, Feb 2012.]The method comprises the steps of classifying dissolved gas by a support vector machine to analyze transformer faults, combining a ratio with an image to serve as an input quantity, and selecting a proper gas index to train, so that the transformer faults can be effectively identified, but the scheme has good effect under the simulation training of small sample data, and identification conflict and ambiguity can be caused if the data size is large; literature [ Li, Jinzhong; zhang, Qiaogen; wang, Ke, et al, "Optimal dispersed Gas Ratios selected genetic Algorithm for Power transducer Fault Diagnosis Based on Support Vector Machine," IEEE Transactions on diagnostics and Electrical Insulation, vol.23, No.2, pp.1198-1206, Apr 2016.]Then, the optimal gas dissolution proportion is obtained by utilizing a genetic algorithm, and the DGA ratio and the optimized parameters are considered, so that a good diagnosis rate can be obtained; therefore, the algorithms can play a certain role in identifying the gas fault in the transformer oil, the problem that the traditional DGA-based method is conflicted can be solved to a certain extent, but the setting of certain key threshold parameters needs to depend on a large amount of data statistics and expert experience, so that the method has certain limitations, and the identification accuracy of part of machine learning and classifier diagnosis methods is greatly reduced under the condition that the parameter threshold is inaccurate. Meanwhile, the DGA method only depends on obtaining the gas concentration to make a judgment, which brings about a large error because the concentration and the growth rate of different gases are different under different conditions (temperature and fault type). For example, when the temperature is below 150 ℃, if partial discharge occurs, CH₄The concentration change of (2) is increased and gradually increased along with the temperature rise; when the temperature exceeds 500 ℃, partial discharge is deteriorated to cause arc fault, and then a certain amount of C is generated by cracking₂H₂And is then CH₄The concentration continues to increase while the temperature continuesIncreasing further, above a certain value, CH₄A drop in concentration occurs. In addition, insulation reduction and other conditions can occur along with the long-time running and aging of the transformer, fault gas with certain concentration can exist in the oil tank continuously, the preset threshold value does not have complete reliability any more, and for sub-health faults such as insulation reduction or partial discharge and mixed faults, if the sub-health faults and the mixed faults can be identified and found in time in the early period, accident deterioration can be prevented quickly, and influences caused by self difference of the transformers are ignored to a certain extent by the methods.

The deep residual shrinkage network is used as an improved version of the deep learning network, can quickly identify the characteristic information of a sample, and can also avoid the problems of difficulty in training, gradient disappearance and the like caused by large network layer number. The document S.Ma, F.Chu, and Q.Han, "Deep residual leading with modulated time-frequency defects for fault diagnosis of planar gear generating conditions," Mech.Syst.Signal Process, vol.127, pp.190-201,2019 ] makes use of the Deep residual network to diagnose gear operating faults. The scholars verify the advantages of the deep residual error network, the network can identify key information from a large amount of noise interference through training, when gas in transformer oil is subjected to data acquisition and is interfered by external information along with the input of various gas components, the traditional deep residual error network can confuse target characteristics with characteristics of other interference items, and accurate identification cannot be achieved. Therefore, it is important for the present invention that the concentration characteristic information of the target gas can be accurately extracted from a large amount of disturbance information.

Disclosure of Invention

Aiming at the defects in the background technology, the invention provides a transformer oil gas composite imaging identification method based on a deep residual error network, and solves the technical problem that the target characteristic cannot be accurately identified from other interference characteristics in the prior art.

The technical scheme of the invention is realized as follows:

a transformer oil gas composite imaging identification method based on a depth residual error network comprises the following steps:

the method comprises the following steps: acquiring the concentration of dissolved gas in the transformer oil and the temperature of a sampling point according to the sampling time, and recombining the concentration of the dissolved gas according to the sampling time and the temperature of the sampling point to obtain a gas characteristic vector;

step two: performing data enhancement on the gas characteristic vector to be used as an input vector, and performing fault assessment on the input vector to obtain a sample set, wherein the sample set comprises a training set and a testing set;

step three: constructing a soft threshold optimization objective function based on a soft threshold in the common channel depth residual shrinkage network, optimizing the soft threshold optimization objective function by using a cross direction multiplier and a FISTA algorithm, and taking the common channel depth residual shrinkage network corresponding to the optimized soft threshold as a sub-channel threshold depth residual shrinkage network;

step four: inputting the training set into a subchannel threshold depth residual shrinkage network for training to obtain a subchannel threshold depth residual shrinkage network model;

step five: and inputting the test set into a subchannel threshold depth residual shrinkage network model for identification, and performing visual dimensionality reduction on an output result by using a UMAP algorithm to obtain a fault type.

Preferably, the dissolved gas comprises CO, CO₂、H₂、CH₄、C₂H₂、C₂H₄And C₂H₆Seven kinds of gases.

Preferably, the method for obtaining the gas characteristic vector by recombining the concentration of the dissolved gas according to the sampling time and the temperature of the sampling point comprises the following steps: mixing CO and CO₂、H₂、CH₄、C₂H₂、C₂H₄And C₂H₆The concentration of the seven gases is used as a characteristic channel of an image, the temperature and the sampling time of a sampling point are respectively used as the height and the width of the image, and a gas characteristic vector with the dimension of C multiplied by W multiplied by I is obtained, wherein C represents the type of the gas, I represents the temperature interval, namely, a group of gas concentration data is taken when the temperature changes by I degrees, and W represents the corresponding gas concentration data in each sampling timeThe number of sampling points.

Preferably, the method for enhancing the data of the gas feature vector comprises the following steps: the concentration of dissolved gas of the transformer with the same model under a certain working condition and the average value of the concentrations of various gases are used as supplementary data to be randomly added into the gas characteristic vector, and meanwhile, noise information is added into the gas characteristic vector to obtain an input vector.

Preferably, the fault evaluation is performed on the input vector, that is, the fault type is labeled on the input vector, and the fault type includes a low-temperature fault, a medium-temperature overheat fault, a low-energy discharge, a high-energy discharge, a partial discharge, a high-temperature overheat fault, a mixture of an overheat fault and a discharge fault, a moisture fault and a normal state.

Preferably, the expression of the soft threshold in the common-channel depth residual shrinkage network is:

where X represents the input characteristic, Y represents the output characteristic, and λ is a positive parameter threshold.

Preferably, the soft threshold optimization objective function is:

wherein A, B each represent a constant matrix.

Preferably, the method for optimizing the soft threshold optimization objective function by using the cross direction multiplier and the FISTA algorithm comprises:

let AX-B be Z, rewrite the soft threshold optimization objective function to target I:

wherein f (X) | | X | | non-woven phosphor₁，

The augmented Lagrangian function for target I is:

wherein, beta is a penalty function, x represents the optimal solution of the local variable, and z represents the optimal solution of the global consistency variable;

the form of the ADMM iterative update is as follows:

wherein, A_iAnd B_iEach representing the ith element, X, in the corresponding constant matrix_iIt is indicated that the (i) th variable,

represents the minimization variable X_iAt the optimal point after k +1 dual iterations,

representing the transpose of the Lagrange multiplier over k dual iterations, i.e. the iteration step, y^kLagrange multiplier, y, representing the k-th dual iteration^kt1Representing a bivariate update value, Z^kRepresenting the variable Z, Z after k dual iterations_iRepresents the ith global consistency optimal solution variable,

representing the ith globally consistent optimal solution after k +1 dual iterationsAn optimal point;

using a near-end gradient descent method or a pair of x-derivatives

Updating to obtain:

wherein, I is an identity matrix,

using soft threshold method pair

Updating to obtain:

wherein S is_λ/βN(. represents a pair

A function solved in a simple closed manner by using sub-differential;

wherein the content of the first and second substances,

the quadratic approximation function at point Z is:

wherein the content of the first and second substances,

a gradient factor representing a globally consistent optimal solution;

the minimum of the quadratic approximation function is abbreviated as:

in conjunction with the abbreviated minimum expression, the minimum of the quadratic approximation function is written as:

the operation of FISTA algorithm starts, and L is₀Greater than 0, eta > 1, and X is epsilon of RⁿSimultaneously order Z₁＝X₀，t₁＝1；

Finding the smallest non-negative integer i_kThen, then

Will be provided with

Substituting until the calculation is completed to obtain:

wherein, t_kIndicating the appropriate step size.

Compared with the prior art, the invention has the following beneficial effects: the method combines the gas concentration, the sampling time and the sampling point temperature into a new characteristic gas high-dimensional vector, the characteristic is equivalent to forming an image, and a characteristic channel in the image is the gas concentration; training and identifying the characteristic gas high-dimensional vector by adopting an improved sub-channel threshold depth residual shrinkage network so as to ensure the running speed of the network and the integrity of an output result; the invention optimizes and improves the network, solves the problem of partial solution loss caused by sparse matrix transposition in the network by utilizing the alternative direction multiplier, and accelerates the output speed of the network by utilizing the FISTA and UMAP algorithms.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 shows basic residual units.

Fig. 2 is a data structure of an input characteristic of the present invention.

FIG. 3 is a structure of a common channel depth residual shrinking network of the present invention; wherein, (a) is a basic module of the depth residual shrinkage network, and (b) is a schematic diagram of the whole structure.

Fig. 4 is an enhancement mode of the training set of the present invention.

FIG. 5 is a parameter setting model of the sub-channel threshold depth residual shrinkage network of the present invention.

FIG. 6 is a flow chart of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

The initial deep neural network is often faced with a plurality of problems during learning, when the training parameters are too much, the network degradation is caused due to the increase of the number of network layers, especially when the training parameters are more complicated and have too high dimension, the objective function comprises a plurality of non-convex optimization solving problems, and the like, and the network training reduction effect is more obvious. In order to solve the problem, a deep residual error network is proposed in the documents [ HEKM, ZHANGXY, RENSQ, et al.Depressedusalllarningforimage Recognition [ C ]//2016IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30,2016.Las Vegas, NV, USA.IEEE,2016: 770-. The basic residual unit is shown in fig. 1.

The deep residual error network is used as one of convolutional neural networks, can train strong noise data to further achieve the purpose of noise reduction, can be used as a basic network for deep learning, and meanwhile, a soft threshold value of the deep residual error network can be used as a self-adaptive threshold value for filtering noise to achieve the purpose of active noise reduction, so that the defects of stiffness and non-universal applicability of manual threshold value setting can be well overcome. The soft threshold method of the network is the key point for removing interference, and the expression is as follows:

where x represents the input characteristic, y represents the output characteristic, and τ is the positive parameter threshold. The operation idea is to set the eigenvalue close to zero in the activation function to zero so as to achieve the purpose of not deleting the negative eigenvalue. The processed derivative inputs are as follows:

it may zero out those features whose absolute values are below the threshold and run other features toward zero as well. As a non-linear variation, similar to the ReLU activation function, may also be used as the activation function. The network acquires useful information through global scanning, and the useful information is enhanced while redundant information is suppressed. Compared with the existing traditional neural network, the network also has the structures of convolution layers, activation functions, normalization, cross entropy error functions and the like. The convolution layer is used for replacing matrix multiplication, so that the number of training parameters can be reduced to a great extent, and the overfitting condition can be avoided, so that higher testing precision is obtained. This convolution is represented as:

wherein x is_iIs the ith channel, y of the input element map_jIs the channel of the output profile, k_ijIs a convolution kernel, b_jIs a deviation, M_jIs the set of channels, which is the fifth channel used to compute the output feature map. The convolution may be repeated multiple times to obtain an output signature. In order to reduce the occurrence of covariance in the network, a batch normalization function (BN) is added into the model, when new parameters are continuously input, the network updates the feature information by continuously training the new parameters, the original feature distribution can be continuously changed, and after the BN is added, the model can perform normalization processing on the features captured before and after the model is added so as to ensure that the model can adapt to the continuous change of the features. To address this, it is necessary to keep the parameters in the convolutional layer constantly changing. The primary function of BN is to change each feature into a normalized distribution and then adjust the features to an ideal distribution during ongoing training. The process is as follows:

wherein x is_nAnd y_nRepresenting the input and output characteristics of BN, γ and β are two trainable parameters to scale and move the distribution, and e is a constant close to zero.

Activation functions are not available as common non-linear changes in neural networks, preventing the gradient from disappearing to some extent. Its function is represented as:

y＝max(x，0)

where x and y represent the input and output of the activation function, respectively.

For each channel of the profile, the network may calculate a gas mean using GAP. And the objective function uses a cross entropy function, and aims to reduce different types of recognition tasks, thereby bringing higher training success rate. In order to calculate the cross entropy error, it is necessary to bring the eigenvalues within the (0,1) range with the softmax function. The expression is as follows:

wherein x is_jAnd y_jIs the input and output characteristic diagram of the softmax function: i and j are neurons of the output layer: n is a radical of hydrogen_ctassIs a class number. Here, y_jIt can be seen as observing the prediction probability belonging to the j-th class. The error of the cross entropy function is as follows:

wherein, t_jAs target output, t_jIs the jth class output.

As shown in fig. 6, a method for identifying composite imaging of gas in transformer oil based on a depth residual error network specifically comprises the following steps:

the method comprises the following steps: acquiring the concentration of dissolved gas in the transformer oil and the temperature of a sampling point according to the sampling time, and recombining the concentration of the dissolved gas according to the sampling time and the temperature of the sampling point to obtain a gas characteristic vector; the dissolved gas comprises CO and CO₂、H₂、CH₄、C₂H₂、C₂H₄And C₂H₆Seven kinds of gases. The method for recombining the concentration of the dissolved gas according to the sampling time and the temperature of the sampling point to obtain the gas characteristic vector comprises the following steps: mixing CO and CO₂、H₂、CH₄、C₂H₂、C₂H₄And C₂H₆The concentrations of the seven gases are used as a characteristic channel of an image, the temperature and the sampling time of a sampling point are respectively used as the height and the width of the image, and a gas characteristic vector with the dimension of C multiplied by W multiplied by I is obtained, wherein C represents the type of the gas, I represents the temperature interval, namely, a group of gas concentration data is taken when the temperature changes by I degree, and W represents the corresponding number of sampling points in each sampling time.

Due to the fact that the characteristics of the transformers are different from the operating environment, any transformer has a unique working condition when in operation. These factors include voltage level, structure, materials, connection to temperature, humidity, age, load, etc. According to the DGA gas analysis method widely applied at present, a definite factor aiming at transformer fault diagnosis is fault characteristic gas concentration, wherein the improved characteristic gas three-ratio method is used for calculating the characteristic gas concentration ratio to analyze whether the transformer has faults or not. Taking into account most of the characteristics of each transformer, a cumbersome process of statistical data classification is required. In order to accurately analyze the operation condition of each transformer and avoid the problem that big data covers individual differences, the temperature and the operation time which are directly related to the gas concentration are selected as key characteristic factors, so that the gas concentration change of each transformer is generated according to the conditions and the influence of the temperature, and a plurality of factors do not need to be classified and analyzed. The basis for determining the temperature as a key attribute is that the molecular diffusion coefficient of a substance represents the diffusion capacity of the substance, and according to Fick's law, the diffusion coefficient passes through the mass of a certain unit area of the substance in a unit time along the diffusion direction, while the inside of the transformer is a mixture of multiple gases, namely, mixed gas diffusion, and the diffusion coefficient is expressed as:

wherein T represents a thermodynamic temperature, P represents a pressure, μ_A，μ_BRepresents the molecular weight of the gas and V represents the molar volume of the gas under normal conditions.

It can be seen that without a change in pressure, the temperature increases and the diffusion rate of the gas increases. At the same time, the concentration of the gas changes with time. The method selects temperature and time as the associated characteristic attributes of the gas concentration, can also ensure that the problem of overhigh dimensionality cannot be brought to the network when the associated characteristic attributes are used as the input characteristics of the deep learning network, can continuously adjust and adapt the obtained result according to the self characteristics of the transformer when the operation data of each transformer is used for carrying out deep learning network training aiming at the transformer, can solve the problem of neglecting the difference between the transformers due to large data coverage, and is equivalent to the action of controlling variables.

And selecting proper fault gas as an input characteristic vector, representing the change situation of each gas concentration of the transformer on a high-dimensional time sequence along with the temperature change during operation at a certain voltage level, and similar to the parameter characteristic structure of an image. As shown in fig. 2: the variable axis of the invention is equivalent to the concentration of several characteristic Gases during the operation of the transformer, and the invention is based on the IEEE Guide for the Interpretation of gas Generated in Oil-impregnated transducers (IEEE Std C57.104)^TM-2008)issued by IEEE Power&Energy Society, selecting fault characteristic gases CO and CO₂、H₂、CH₄、C₂H₂、C₂H₄、C₂H₆The characteristic that the characteristic gas is equivalent to a characteristic channel of image properties, and the temperature and time axes are similar to the height and width axes of an image enables the invention to learn the characteristics of fault gas in transformer oil by utilizing a deep residual error network in the field of image classification, and fault classification and prediction results of each transformer under different temperatures and operation times can be obtained by processing a gas concentration characteristic sequence. The initial input characteristics are shown in table 1.

TABLE 1 input characteristics

The input data is formed into a format of C multiplied by W multiplied by I, wherein C represents the fault gas quantity of the transformer and comprises CO and CO₂、H₂、CH₄、C₂H₂、C₂H₄、C₂H₆The total of 7 gases are used as characteristic channels, and 7 gases are in a group, so the number of the channels is 7. I represents a temperature interval which is set to be 1 ℃, namely, a group of gas concentration data is taken when the temperature changes once, the value can be modified according to different monitoring equipment specifications, W represents the number of data acquisition points per minute, the frequency of the data acquisition is 10s/min, the number of sampling points is 6, and the time interval is 10 s. This configuration ensures that the present invention is able to identify and predict characteristic gas signals hierarchically on a time scale as temperature changes. The selection of the variable can also be the phase velocity and the absolute velocity of gas diffusion, and the invention adopts the gas concentration as an observation attribute to research, thereby avoiding the interference of gases with each other and the situation that the gases are dissolved in oil. In addition, under different work condition, the concentration that takes place its corresponding gas of different types trouble also can change to characteristic gas concentration is as the acquisition signal, and irrelevant gas concentration forms the condition of interference as the noise when also can handling the trouble, makes the network focus on the characteristics of trouble target gas more.

Step two: performing data enhancement on the gas characteristic vector to be used as an input vector, and performing fault assessment on the input vector to obtain a sample set, wherein the sample set comprises a training set and a testing set; the method for enhancing the data of the gas characteristic vector comprises the following steps: the concentration of dissolved gas of the transformer with the same model under a certain working condition and the average value of the concentrations of various gases are used as supplementary data to be randomly added into the gas characteristic vector, and meanwhile, noise information is added into the gas characteristic vector to obtain an input vector. And performing fault evaluation on the input vector, namely labeling fault types of the input vector, wherein the fault types comprise low-temperature faults, medium-temperature overheat faults, low-energy discharge, high-energy discharge, partial discharge, high-temperature overheat faults, overheat faults and discharge faults mixed, insulation moisture faults (mixed electric heating faults) and normal states.

In the field of image recognition, proper rotation of the image does not change the characteristics of the image, but can be used as new data to input a rich data set, so that the aims of making up the prior information loss and considering the anti-noise interference are fulfilled, and the method can prevent the occurrence of overfitting to a certain extent. In order to enhance the robustness of the model, the invention ensures the sufficient data of the input network and the reliability of the model through data enhancement, so that the data of the transformer with the same model under a certain working condition is added into a data set as supplement randomly, and the input of the random variable can be the average value of the related variable. Meanwhile, if the added noise data is small in assignment, the network identification and classification result cannot be changed, and when the data set is small, the noise information is added to play a role in enriching the data set. The training set after the initial parameter enhancement is shown in fig. 3.

For the initial set of training parameters, gaussian white noise H (0, ∈) is added to the input variable to obtain the latest training parameters. Then the new training parameter input variable X is:

X^*＝X×(1tδ)，δ～H(0，∈)

wherein, epsilon is randomly generated from 0 to H, and the value of H can be artificially set according to the past statistical data. By enhancing the input characteristic data, the training set under the condition of information loss can be expanded by e times. However, it should be noted that the value of e is selected according to the training effect, and should not be too large, otherwise the calculation efficiency is affected. And obtaining an enhanced training feature sample set through processing the input feature parameters.

And correspondingly dividing the sample into a training set and a testing set according to the model of the transformer and the voltage grade. Considering routine maintenance and incomplete data of partial new transformers, the reference data is enhanced to supplement samples, and the experiment is carried out by adopting a cross validation method, wherein each data set is divided into 10 subsets, 1 subset is used as a test set, and 9 subsets are used as training sets. And the transformer with faults in the experimental process is subjected to fault assessment and maintenance by workers in the station. In the experiment, a network model is constructed by using a python language, and hardware is a computer carrying an i7-9750 central processing unit and an NVIDIA Geforce GTX processor.

For different types of fault problems, concentration change conditions of seven gases at different temperatures are considered in the experiment, and fault classifications are shown in table 2:

TABLE 2 nine experimental states of transformer DGA failure

the common channel depth residual error network adopted by the invention is provided by the documents [ ZHAO M, ZHONG S, FUX, et al. deep residual reducing transformer networks for fault diagnosis [ J ]. IEEE Transactions on Industrial information, 2020,16(7):4681 and 4690 ] when the bearing fault diagnosis is carried out, the network can effectively diagnose the fault, and the structure of the common channel depth residual error shrinkage network is shown in figure 3.

The common-channel depth residual shrinkage network comprises an input layer, a convolution layer, a depth residual shrinkage network basic module, a global mean pooling layer and a fully-connected output layer, and the construction process is as follows:

firstly, constructing a basic module of a depth residual shrinkage network, embedding a sub-network in the basic module for automatically setting a threshold value required by soft thresholding, adopting a residual shrinkage module for sharing the threshold value among channels, and firstly carrying out batch standardization, ReLu activation function and convolution layer operation on an input feature map twice; then, calculating absolute values of all the characteristics, and taking the average value as the characteristic; in the other path, after the absolute value of the feature is subjected to global mean pooling, the absolute value is input into a two-layer full-connection network, then the output is normalized to be between 0 and 1 by using a Sigmoid function, a scale parameter alpha is obtained, and the final threshold value is represented as alpha multiplied by F; finally, adding the original input characteristic graph and the threshold value by using the identity in Tensorhow and returning;

the structure input layer receives external input of the neural network model and transmits the external input to the convolutional layer, wherein the external input is a gas characteristic vector;

constructing a convolutional layer, receiving the output of an input layer by the convolutional layer, repeating the convolution operation to obtain a feature map, and transmitting the feature map to a depth residual shrinkage network basic module;

stacking a depth residual shrinkage network basic module, wherein the characteristic graph output by the convolution layer is processed by the depth residual shrinkage network module and then transmitted to a batch standardization layer;

constructing a batch standardization layer, and activating a function ReLu and a global averaging layer;

constructing a fully connected output layer, receiving the output from the global averaging layer;

and the fully-connected output layer corresponds to all classes contained in the sample data, the output value is the probability value of the sample belonging to each class, and the class corresponding to the maximum output value is taken as the sample class of model prediction.

The obvious difference between the common-channel depth residual shrinking network and the traditional depth residual network is that a soft threshold is introduced, and the soft threshold is used as an effective noise signal removing method and is also a key step of the network. The principle is to set the converted signal in the domain close to 0, and it can adjust adaptively along with the change of the input characteristic parameter. The traditional common signal noise removing method comprises wavelet denoising, hard threshold denoising and soft threshold denoising, but the hard threshold denoising can generate the conditions of jitter, unsmooth and the like, and more importantly, the hard threshold denoising needs to determine a threshold by means of expert experience or statistics, the threshold cannot have objectivity to a certain extent and is stiff, generally speaking, along with the increase of the running time of a transformer, the reduction of internal insulation and loss of the transformer can be aggravated, and the setting of the hard threshold can generate the defect of idealization. The conventional wavelet denoising requires a great deal of signal processing skills to be inherited, and is difficult. Therefore, denoising with a gradient-decreasing soft threshold is an optimal choice. The expression of the soft threshold in the common-channel depth residual shrinkage network is as follows:

However, when used, the function of the soft threshold is fundamentally related to the transpose of a and x. Meanwhile, although the soft threshold has continuous smoothness, its constant deviation may cause partial data omission and may have a problem of lower signal-to-noise ratio than the hard threshold. The present invention optimizes this problem by using backtracking in the cross direction multipliers. Meanwhile, in order to accelerate the problem of the speed of solving the soft threshold value under a large amount of data, the FISTA algorithm is added in the network soft threshold value solving.

The soft threshold optimization objective function is:

wherein, A and B both represent any constant matrix and can be set according to actual conditions.

According to the definition of norm, decomposing the expression of the soft threshold optimization objective function, and comparing the relation between B and lambda to obtain the general form solution;

the result is a matrix of soft thresholds, which is solved for and optimized for a number of independent functions. Therefore, the method relates to the transposition problem of A and x in a sparse matrix in high-dimensional feature solution and the solution of a large-scale convex optimization problem. In the threshold determination process, the network is solved in a manner that seeks local optima. Because the function progressiveness can cause deviation to exist all the time, the method can not accurately and flexibly judge the accurate relation between the measured value and the fault threshold value all the time. For example, 1 is set as the failure threshold, and 0.9 and 0.99 are measured values, respectively, but under different standards, 0.9 may be determined as the early warning state, and 0.99 may be determined as the normal state. Therefore, this locally sought solution tends to result in edge solutions being lost and constant deviations being amplified during the rank-conversion process.

Aiming at the solving problem of the convex optimization problem, each subproblem is optimized and solved by utilizing a cross direction multiplier, and finally, the optimal solution of the subproblems is coordinated and a global coordinated optimal solution is finally obtained, so that incompleteness and feature omission of the soft threshold can be solved to a certain extent. The method comprises the following specific steps:

wherein f (X) | | X | | non-woven phosphor₁，

The augmented Lagrangian function for target I is:

wherein, beta is a penalty function, x represents a local variable, and z represents a global consistency variable; according to the constraint conditions, each local variable needs to be kept consistent, namely the whole data set is divided into each node, each node is trained independently, and finally a global consistent solution is converged. The determination mode of the rewritten fault threshold value is more accurate and flexible. For example: when the fault threshold is 1, the model captures a state difference value of 0.1 after identifying a large number of samples, and if the difference between the measured value and the fault threshold is greater than 0.1, the model is determined to be in a normal state. The weight (threshold) precision of each characteristic gas in the fault can be effectively improved by matching with the self-adaptive variable weight cross entropy function, and the threshold can be ensured to be self-updated along with the input of new data and the state change of equipment.

The form of the ADMM iterative update is as follows:

wherein A is_iAnd B_iEach representing the ith element, X, in the corresponding constant matrix_iIt is indicated that the (i) th variable,

representing the transpose of the Lagrangian multiplier over k dual iterations, i.e. the iteration step, y^kLagrange representing the k-th dual iterationDaily multiplier, y^kt1Representing a bivariate update value, Z^kRepresenting the variable Z, Z after k dual iterations_iRepresents the ith global consistency optimal solution variable,

and (4) representing the optimal point of the ith global consistency optimal solution after k +1 dual iterations.

Using a near-end gradient descent method or a pair of x-derivatives

Updating to obtain:

wherein I is an identity matrix;

using soft threshold method pair

Updating to obtain:

wherein S is_λp.beta.N (-) represents a pair

The function is solved using a simple closed form of sub-differentiation.

Wherein the content of the first and second substances,

it can be seen that in the conventional soft threshold calculation method, each iteration of the step needs a times its transpose, and then a soft threshold step is performed, and by continuous near-end gradient descent, a local minimum is obtained when the absolute value of the gradient is infinitely close to 0. In the operation, the Lipschitz constant of the fixed step length is not necessarily known to be calculated, the constant depends on the transposition of a noise matrix multiplied by the maximum value of a noise signal, the optimization problem of the L1 norm is also involved, when the input data is large, the calculation amount is also large, and in order to solve the problem, the soft threshold value is calculated through the FISTA algorithm with the backtracking function.

The quadratic approximation function at point Z is:

wherein the content of the first and second substances,

gradient factors representing a globally consistent optimal solution.

Since the constant term has no effect on the result, neglecting the constant term, the minimum of the quadratic approximation function is abbreviated as:

the operation of FISTA algorithm is started, and L is₀Greater than 0, eta > 1, and X ∈ RⁿSimultaneously order Z₁＝X₀，t₁＝1；

Finding the smallest non-negative integer i_kThen, then

Will be provided with

Substituting until the calculation is completed to obtain:

wherein, t_kIndicating the appropriate step size.

The setting of the hyperparameters such as the number of network layers, the number of convolution kernels and the like does not have an optimal standard at present, so the method is set according to the conventional proposal at present. The parameter setting of the sub-channel threshold depth residual shrinkage network adopted by the transformer fault identification is based on a framework of ResNet34, as shown in FIG. 5, the parameter setting of each residual block in FIG. 5 is shown in Table 3, and the parameter setting of the sub-channel threshold depth residual shrinkage network is completed in a parameter debugging stage through the process of FIG. 5.

TABLE 3 setting of residual blocks

Step four: inputting the training set into a subchannel threshold depth residual shrinkage network for training to obtain a subchannel threshold depth residual shrinkage network model; model parameters (reference hyper-parameter settings) are initialized. The output of the network was determined to be 9 neurons containing 1 normal state and 8 fault states as shown in table 2. In the training process, the training rate changes with the change of the period, from the first 20 time periods to 0.1, the next 20 time periods to 0.03, the middle 20 time periods to 0.01 and 0.003, and the last 20 time periods to 0.001, in order to ensure that the parameters are updated in step size at the beginning of the training to obtain the optimal output. The penalty term is added into the objective function in order to avoid overfitting in the training process by matching with L2 regularization, and the penalty coefficient is set to be 0.0001 adopted by the traditional deep neural network. Finally, there are 9 neurons at the output level, i.e. 1 healthy state and 8 faulty states.

Specific examples

In 29/2020 to 11/8/2020, the method of the present invention was tested in field operation on transformers from some branch of the national grid, and the test results are shown in table 4.

TABLE 4 Transformer characteristic gas concentration

The proposed identification network of the invention emits an alarm signal at a date 08:30AM, which indicates the possible presence of arcing therein. 30 minutes in the afternoon, the power plant monitoring equipment only captures this signal due to heavy gas protection, transformer tripping. The staff then analyzes the oil chromatogram data inside the transformer and concludes that the concentration of the fault gas conforms to the three-ratio code combination 102, which means that an arc fault has occurred inside the transformer. The cause of this failure is the cracking of the insulating mat due to its own problems. With the operation of the transformer and the change of the load (the main transformer No.1 returns to the factory for maintenance, the load of the transformer No.2 increases), the insulation damage is accelerated, then a short circuit phenomenon occurs, the fault current may reach several hundred amperes, and high temperature is generated to burn out the faulty insulation pad, and finally, arc discharge is caused. When an alarm signal is sent out, the internal fault already enters the hatching stage, the characteristic gas in the transformer slightly changes, but the concentration does not meet the conditions of the three-ratio method and the accuracy requirement of the monitoring system.

In 6 noon, 7/2020, the method monitors that the concentrations of hydrogen and alkane gas in the transformer are both increased greatly, and according to the operation regulation and equipment observation results of the transformer, the method estimates that water enters the transformer, continuous spark discharge (short circuit) exists in the transformer, and the concentration change of the fault potential gas is obvious. In the evening, the transformer explodes. The power-off maintenance discovers that the internal insulation of the transformer is damaged, and a large amount of rainwater permeates into the transformer to cause the accident.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A transformer oil gas composite imaging identification method based on a depth residual error network is characterized by comprising the following steps:

step five: inputting the test set into a subchannel threshold depth residual shrinkage network model for identification, and performing visual dimensionality reduction on an output result by using a UMAP algorithm to obtain a fault type;

the method for optimizing the soft threshold optimization objective function by using the cross direction multiplier and the FISTA algorithm comprises the following steps:

s.t AX-Z＝B；

wherein f (X) | | X | | non-woven phosphor₁，

The augmented Lagrangian function for target I is:

the form of iterative update of ADMM is as follows:

represents the minimization variable X_iThroughThe optimum point after k +1 dual iterations,

representing the transpose of the Lagrange multiplier over k dual iterations, i.e. the iteration step, y^kLagrange multiplier, y, representing the k-th dual iteration^k+1Representing a bivariate update value, Z^kRepresenting the variable Z, Z after k dual iterations_iRepresents the ith global consistency optimal solution variable,

the optimal point of the ith global consistency optimal solution after k +1 dual iterations is represented;

using a near-end gradient descent method or a pair of x-derivatives

Updating to obtain:

wherein, I is an identity matrix,

using soft threshold method pair

Updating to obtain:

wherein S is_λ/βN(. represents a pair

A function of simple closed solution by using sub-differential;

wherein the content of the first and second substances,

the quadratic approximation function at point Z is:

wherein the content of the first and second substances,

a gradient factor representing a globally consistent optimal solution;

the minimum of the quadratic approximation function is abbreviated as:

the operation of FISTA algorithm starts, and L is₀Greater than 0, eta > 1, and X ∈ RⁿSimultaneously order Z₁＝X₀，t₁＝1；

Finding the smallest non-negative integer i_kThen, then

Will be provided with

Substituting until the calculation is completed to obtain:

wherein, t_kIndicating the appropriate step size.

2. The method for identifying the gas in transformer oil based on the deep residual error network as claimed in claim 1, wherein the dissolved gas comprises CO and CO₂、H₂、CH₄、C₂H₂、C₂H₄And C₂H₆Seven kinds of gases.

3. The transformer oil gas composite imaging identification method based on the depth residual error network according to claim 2, wherein the method for recombining the concentration of dissolved gas according to the sampling time and the sampling point temperature to obtain the gas characteristic vector comprises the following steps: mixing CO and CO₂、H₂、CH₄、C₂H₂、C₂H₄And C₂H₆The concentrations of the seven gases are used as a characteristic channel of an image, the temperature and the sampling time of a sampling point are respectively used as the height and the width of the image, and a gas characteristic vector with the dimension of C multiplied by W multiplied by I is obtained, wherein C represents the type of the gas, I represents the temperature interval, namely, a group of gas concentration data is taken when the temperature changes by I degree, and W represents the corresponding number of sampling points in each sampling time.

4. The method for identifying the gas in transformer oil composite imaging based on the depth residual error network as claimed in claim 2 or 3, wherein the method for enhancing the gas feature vector comprises: the concentration of dissolved gas of the transformer with the same model under a certain working condition and the average value of the concentrations of various gases are used as supplementary data to be randomly added into the gas characteristic vector, and meanwhile, noise information is added into the gas characteristic vector to obtain an input vector.

5. The method for identifying the gas in the transformer oil based on the deep residual error network as claimed in claim 1, wherein the fault evaluation is performed on the input vector, that is, the fault type is labeled on the input vector, and the fault type includes a low temperature fault, a medium temperature overheating fault, a low energy discharge, a high energy discharge, a partial discharge, a high temperature overheating fault, a mixture of the overheating fault and the discharge fault, a moisture fault and a normal state.

6. The method for identifying the gas in transformer oil composite imaging based on the depth residual error network as claimed in claim 1, wherein the expression of the soft threshold in the common channel depth residual error shrinkage network is as follows:

7. The method for identifying the gas in transformer oil composite imaging based on the depth residual error network as claimed in claim 6, wherein the soft threshold optimization objective function is:

wherein A, B each represent a constant matrix.