CN114019282A

CN114019282A - Transformer fault diagnosis method based on principal component analysis and random forest phase fusion

Info

Publication number: CN114019282A
Application number: CN202111298905.8A
Authority: CN
Inventors: 陈龙谭; 于虹; 李�昊; 王宣军
Original assignee: Electric Power Research Institute of Yunnan Power Grid Co Ltd
Current assignee: Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority date: 2021-11-04
Filing date: 2021-11-04
Publication date: 2022-02-08

Abstract

The transformer fault diagnosis method based on principal component analysis and random forest phase fusion comprises the steps of determining the fault oil chromatographic data set ratio dimension of a fault type transformer as a fault code after a first fault oil chromatographic data set, eliminating the correlation among dimensional characteristics by using a principal component analysis model, preferably adjusting the first fault oil chromatographic data set until 8 main characteristics are remained as a second fault oil chromatographic data set, and performing the following steps according to the ratio of 0.8: and dividing the ratio of 0.2 into a training set and a test set, detecting the fault diagnosis accuracy of the optimized random forest classification model (namely the first random forest classification model) trained by the training set by the test set, and inputting the transformer oil chromatographic data into the first random forest classification model (namely the final random forest classification model) with the fault diagnosis accuracy not less than the set fault diagnosis accuracy to obtain a diagnosis result. The ratio dimensionality enhancement is combined with the principal component analysis model, the information content of the fault oil chromatographic data set is fully mined, and the accuracy of fault diagnosis is improved.

Description

Transformer fault diagnosis method based on principal component analysis and random forest phase fusion

Technical Field

The application relates to the field of transformer fault diagnosis, in particular to a transformer fault diagnosis method based on principal component analysis and random forest fusion.

Background

A power transformer is a stationary electrical device that is used to transform an ac voltage of a certain value into another voltage of the same frequency or several different values. A power transformer is one of important electrical devices in a power system, and a power plant and a substation raise or lower a voltage to a voltage required by a power utilization area through the power transformer to supply power to each place. The power supply reliability is affected by the power transformer failure, and in order to improve the power supply reliability, the power transformer is repaired in time after the power transformer fails.

In order to repair the fault in time after the fault of the power transformer, in the prior art, a fault type of the power transformer is judged according to components and content of gas by a three-ratio method through a dissolved gas analysis technology in power transformer oil, however, the efficiency of extracting effective information in a fault data set by the three-ratio method is low, and the accuracy of judging the fault type of the power transformer is low.

Disclosure of Invention

The application provides a transformer fault diagnosis method based on principal component analysis and random forest fusion, and aims to solve the technical problem that the accuracy rate of judging the fault type of a power transformer is low.

In order to solve the technical problem, the embodiment of the application discloses the following technical scheme:

on the first hand, the embodiment of the application discloses a transformer fault diagnosis method based on principal component analysis and random forest phase fusion, which comprises the steps of detecting oil chromatograms of transformers with definite fault types, extracting fault oil chromatograms of the detected fault transformers, and integrating the fault oil chromatograms into a fault oil chromatograms dataset;

carrying out ratio dimension increasing on the fault oil chromatographic data set, taking the fault oil chromatographic data set subjected to ratio dimension increasing as a first fault oil chromatographic data set, and carrying out fault coding;

judging the correlation among all the dimensional features according to a correlation thermodynamic diagram among 36 dimensional features in the first fault oil chromatogram data set;

establishing a principal component analysis model, eliminating the correlation among all the dimensional characteristics by adopting the principal component analysis model, adjusting and optimizing the first fault oil chromatographic data set to the remaining 8 main characteristics, taking the first fault oil chromatographic data set with the adjusted and optimized parameters to the remaining 8 main characteristics as a second fault oil chromatographic data set, wherein the second fault oil chromatographic data set comprises 99% of information content in the first fault oil chromatographic data set, and the second fault oil chromatographic data set is calculated according to the following formula of 0.8: dividing the ratio of 0.2 into a training set and a test set;

establishing a parameter-adjusting and optimal-selecting random forest classification model, training the random forest classification model by adopting a training set, taking the trained random forest classification model as a first random forest classification model, detecting the fault diagnosis accuracy of the first random forest classification model by adopting a test set, and taking the first random forest classification model as a final random forest classification model when the fault diagnosis accuracy of the first random forest classification model is more than or equal to the set fault diagnosis accuracy;

and inputting newly detected transformer oil chromatographic data into a final random forest classification model for fault diagnosis to obtain a diagnosis result of whether the transformer has faults or not.

Optionally, establishing a principal component analysis model, eliminating correlation among the dimensional features by using the principal component analysis model, preferentially selecting the first fault oil chromatographic data set to the remaining 8 main features, and using the first fault oil chromatographic data set with the preferred parameter to the remaining 8 main features as a second fault oil chromatographic data set, where the second fault oil chromatographic data set includes 99% of information content in the first fault oil chromatographic data set, and the method includes:

inputting a sample set X ═ X of an n-dimensional space¹,x²,···,x^mWherein x isⁱ∈x^mAnd mapped to k-dimensional space;

preprocessing, the formula for changing the sample mean value to 0 is:

preprocessing, the formula for changing the sample variance to 1 is as follows:

xⁱ＝xⁱ/σ

calculating the covariance matrix XX^TFor the covariance matrix XX^TPerforming characteristic decomposition;

the maximum k characteristic values and k characteristic vectors corresponding to the characteristic values are obtained and are marked as omega¹，ω²，···，ω^kOutput projection matrix W ═ ω¹,ω²···,ω^kWhere ω is^k∈RⁿIn the parameter adjustment and optimization process, the dimension number is the same as the number of main components, and the minimum k is selected as the number k of the main components, so that the formula for retaining the difference of 99% of the original data is as follows:

wherein m represents the number of features; x⁽ⁱ⁾Representing an initial matrix;

and representing a matrix after dimensionality reduction to k dimension, wherein a molecule represents the sum of distances between an original point and a projection point, and the smaller the error is, the more complete the data after dimensionality reduction can represent the data before dimensionality reduction, and if the error is less than 0.01, the more 99% of information can be retained in the data after dimensionality reduction.

Optionally, establishing a parameter-adjusting and preferred random forest classification model, including:

the measure of the degree of purity was set as: criterion ═ mse', whether there is a dropped sample is set to: the number of features considered when restricting branching is set as: max _ features ═ sqrt', the maximum depth of the tree is set to: when the nodes are divided according to the attributes, the minimum number of samples per division is set as: min _ samples _ split is 5, and the number of decision trees is set as: n _ estimators is 1000, and the minimum number of leaf nodes is set as: min _ samples _ leaf is 4.

Optionally, the extracted fault oil chromatographic data is subjected to ratio dimension increasing, the fault oil chromatographic data subjected to ratio dimension increasing is used as first fault oil chromatographic data, and the fault oil chromatographic data is integrated into a fault oil chromatographic data set, including:

the fault oil chromatographic data set comprises the content of total hydrocarbon formed by integrating hydrogen, methane, ethane, ethylene, acetylene, carbon monoxide, carbon dioxide and all hydrocarbon;

the first set of fault oil chromatographic data includes hydrogen, methane, ethane, ethylene, acetylene, carbon monoxide, carbon dioxide and total hydrocarbon content, hydrogen to methane, ethane, ethylene, acetylene, carbon monoxide, carbon dioxide and total hydrocarbon content ratios, methane to ethane, ethylene, acetylene, carbon monoxide, carbon dioxide and total hydrocarbon content ratios, ethane to ethylene, acetylene, carbon monoxide, carbon dioxide and total hydrocarbon content ratios, ethylene to acetylene, carbon monoxide, carbon dioxide and total hydrocarbon content ratios, acetylene to carbon monoxide, carbon dioxide and total hydrocarbon content ratios, carbon monoxide to carbon dioxide and total hydrocarbon content ratios and carbon dioxide and total hydrocarbon content ratios.

Optionally, inputting newly detected transformer oil chromatographic data into the final random forest classification model for fault diagnosis, and obtaining a diagnosis result of whether the transformer has a fault, including:

when the diagnosis result is that the transformer has a fault, the type and the position of the fault can be obtained.

The beneficial effect of this application does:

the flow schematic diagram of the transformer fault diagnosis method based on principal component analysis and random forest phase fusion provided by the embodiment of the application comprises the steps of detecting oil chromatograms of transformers with definite fault types, extracting fault oil chromatograms of the detected fault transformers, integrating the fault oil chromatograms into a fault oil chromatograms, performing ratio dimension raising on the fault oil chromatograms, using the fault oil chromatograms with the ratio dimension raised as a first fault oil chromatograms, performing fault coding, judging the relevance among all the dimensionality characteristics according to the relevance thermodynamic diagram among 36 dimensionality characteristics in the first fault oil chromatograms, establishing a principal component analysis model, eliminating the relevance among all the dimensionality characteristics by using the principal component analysis model, adjusting and optimizing the first fault oil chromatograms to the remaining 8 principal characteristics, and using the first fault oil chromatograms with the adjusted and optimized to the remaining 8 principal characteristics as a second fault oil chromatograms And a spectrum data set and a second fault oil chromatogram data set contain 99% of information content in the first fault oil chromatogram data set, and the second fault oil chromatogram data set is divided into a first fault oil chromatogram data set and a second fault oil chromatogram data set according to the ratio of 0.8: dividing the proportion of 0.2 into a training set and a testing set, establishing a parameter-adjusting and optimizing random forest classification model, training the random forest classification model by adopting the training set, taking the trained random forest classification model as a first random forest classification model, detecting the fault diagnosis accuracy of the first random forest classification model by adopting the testing set, taking the first random forest classification model as a final random forest classification model when the fault diagnosis accuracy of the first random forest classification model is more than or equal to the set fault diagnosis accuracy, inputting newly detected transformer oil chromatographic data into the final random forest classification model for fault diagnosis, and obtaining the diagnosis result whether the transformer has faults or not. The dimension of fault oil chromatographic data is improved by a ratio dimension increasing method, the relevance among dimension characteristics is eliminated by a principal component analysis model, the first fault oil chromatographic data is preferably adjusted to the remaining 8 main characteristics, and the information content contained in the fault oil chromatographic data set is fully mined by combining the ratio dimension increasing method and the principal component analysis model. And a training set in a second fault oil chromatography data set with fully excavated information content is input into a random forest classification model, and the random forest classification model is trained, so that the accuracy of judging whether the power transformer has faults and fault types by the random forest classification model is improved, and the accuracy of judging whether the power transformer has faults and fault types by a transformer fault diagnosis method based on principal component analysis and random forest fusion is further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a transformer fault diagnosis method based on principal component analysis and random forest fusion according to an embodiment of the present application;

fig. 2 is a schematic process diagram of a transformer fault diagnosis method based on principal component analysis and random forest fusion according to an embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, the embodiment of the present application provides a low accuracy in determining the fault type of the power transformer, including steps S110 to S160.

S110: and detecting the oil chromatogram of the transformer with the determined fault type, extracting fault oil chromatogram data of the detected fault transformer, and integrating the fault oil chromatogram data into a fault oil chromatogram data set.

S120: and (4) performing ratio dimension increasing on the fault oil chromatographic data set, taking the fault oil chromatographic data set subjected to ratio dimension increasing as a first fault oil chromatographic data set, and performing fault coding.

In some embodiments, as shown in fig. 2, in the actual encoding, when singular value decomposition is performed on the covariance matrix in the implementation process of the principal component analysis, the S matrix can be obtained. The expression for the principal component analysis error is equivalent to the following equation:

wherein S_iIs a matrix of eigenvalues.

In some embodiments, as shown in fig. 2, performing ratio dimension increasing on the extracted fault oil chromatographic data, taking the fault oil chromatographic data after the ratio dimension increasing as first fault oil chromatographic data, and integrating the fault oil chromatographic data into a fault oil chromatographic data set, includes:

the first set of fault oil chromatographic data includes hydrogen, methane, ethane, ethylene, acetylene, carbon monoxide, carbon dioxide and total hydrocarbon content, hydrogen to methane, ethane, ethylene, acetylene, carbon monoxide, carbon dioxide and total hydrocarbon content ratios, methane to ethane, ethylene, acetylene, carbon monoxide, carbon dioxide and total hydrocarbon content ratios, ethane to ethylene, acetylene, carbon monoxide, carbon dioxide and total hydrocarbon content ratios, ethylene to acetylene, carbon monoxide, carbon dioxide and total hydrocarbon content ratios, acetylene to carbon monoxide, carbon dioxide and total hydrocarbon content ratios, carbon monoxide to carbon dioxide and total hydrocarbon content ratios and carbon dioxide and Total Hydrocarbon (TH) content ratios.

The specific ratio mode of the first fault oil chromatogram data set is shown in table 1:

TABLE 1

Wherein TH is CH₄+C₂H₆+C₂H₄+C₂H₂。

In some embodiments, as shown in fig. 2, since the random forest classification model is a tree-based model, when processing variables, rather than being based on vector space measurement, the numerical value is only a category, i.e. there is no partial order relationship, and more reasonable label coding can be used. In the present application, "high temperature overheat", "medium temperature overheat", "low temperature overheat", "partial discharge", "low energy discharge", and "normal" in the fault type are encoded as "1", "2", "3", "4", "5", "6", and "7", respectively, using LabelEncode (tag code).

S130: and judging the correlation among the dimension characteristics according to a correlation thermodynamic diagram among the 36 dimension characteristics in the first fault oil chromatographic data set.

S140: establishing a principal component analysis model, eliminating the correlation among all the dimensional characteristics by adopting the principal component analysis model, adjusting and optimizing the first fault oil chromatographic data set to the remaining 8 main characteristics, taking the first fault oil chromatographic data set with the adjusted and optimized parameters to the remaining 8 main characteristics as a second fault oil chromatographic data set, wherein the second fault oil chromatographic data set comprises 99% of information content in the first fault oil chromatographic data set, and the second fault oil chromatographic data set is calculated according to the following formula of 0.8: a scale of 0.2 is divided into a training set and a test set.

In some embodiments, as shown in fig. 2, establishing a principal component analysis model, eliminating the correlation between the dimensional features by using the principal component analysis model, and preferably tuning the first faulty oil chromatographic data set to the remaining 8 main features, and using the first faulty oil chromatographic data set with the tuning being preferred to the remaining 8 main features as a second faulty oil chromatographic data set, where the second faulty oil chromatographic data set includes 99% of the information content in the first faulty oil chromatographic data set, includes:

inputting a sample set X ═ X of an n-dimensional space¹,x²,···,x^mWherein x isⁱ∈x^mAnd is combined withMapping to k-dimensional space;

preprocessing, the formula for changing the sample mean value to 0 is:

preprocessing, the formula for changing the sample variance to 1 is as follows:

xⁱ＝xⁱ/σ

In some embodiments, one of the most critical parameters of the principal component analysis model is n _ components, which, if set to integers, are reduced to several principal components, and if set to decimals, indicate the information that the reduced-dimension data can retain. The principal component parameters were set as: n _ components is 8, i.e. dimensionality reduction to 8 principal components.

The method avoids the excessive loss of the information quantity in the second fault oil chromatographic data set caused by the fact that the correlation among all the dimensional characteristics is eliminated by the principal component analysis model, thereby ensuring the information quantity in the second fault oil chromatographic data set, improving the accuracy of judging whether the power transformer is in fault and the fault type by the random forest classification model, and further improving the accuracy of judging whether the power transformer is in fault and the fault type by the transformer fault diagnosis method based on principal component analysis and random forest phase fusion.

In some embodiments, the data set is partitioned into a training set and a validation set using a train _ test _ split () function, where the test set is sized to: test _ size ═ 0.2, the random seed is set to: and random _ state is 1, so that the data set division is unique during each operation, and the result can be reproduced.

S150: establishing a parameter-adjusting optimized random forest classification model, training the random forest classification model by adopting a training set, taking the trained random forest classification model as a first random forest classification model, detecting the fault diagnosis accuracy of the first random forest classification model by adopting a test set, and taking the first random forest classification model as a final random forest classification model when the fault diagnosis accuracy of the first random forest classification model is more than or equal to the set fault diagnosis accuracy.

In some embodiments, as shown in fig. 2, establishing a tuning-parameter-optimized random forest classification model includes:

At one endIn some embodiments, the random forest classification model also needs to consider two parameters: the number n _ { tree } of the constructed decision tree, the number k of input features to be considered when each node of the decision tree is split, and usually k can be log₂n, where n represents the number of features in the original dataset. The construction of a single decision tree can be divided into the following steps:

assuming that the number of training samples is m, the number of input samples corresponding to each decision tree is m, and the m samples are randomly extracted from the training set in a place-back manner;

assuming that the number of training sample features is n, randomly selecting k sample features corresponding to each decision tree from the n features, and then selecting a best input feature from the k input features for splitting;

each tree is split until all training examples for that node belong to the same class. Pruning is not required during the decision tree splitting process.

S160: and inputting newly detected transformer oil chromatographic data into a final random forest classification model for fault diagnosis to obtain a diagnosis result of whether the transformer has faults or not.

In some embodiments, inputting newly detected transformer oil chromatographic data into a final random forest classification model for fault diagnosis to obtain a diagnosis result of whether a fault exists in the transformer, including:

As can be seen from the foregoing embodiments, the schematic flow chart of the transformer fault diagnosis method based on principal component analysis and random forest phase fusion provided in the embodiments of the present application includes detecting an oil chromatogram of a transformer with a specific fault type, extracting fault oil chromatogram data of the detected fault transformer, integrating the fault oil chromatogram data into a fault oil chromatogram data set, performing ratio dimension raising on the fault oil chromatogram data set, using the fault oil chromatogram data set after the ratio dimension raising as a first fault oil chromatogram data set, performing fault coding, determining correlations among dimensional features according to a correlation thermodynamic diagram among 36 dimensional features in the first fault oil chromatogram data set, establishing a principal component analysis model, eliminating the correlations among the dimensional features by using the principal component analysis model, and preferably tuning the first fault oil chromatogram data set to the remaining 8 principal features, and taking a first fault oil chromatographic data set with the adjustment parameters optimized to the rest 8 main characteristics as a second fault oil chromatographic data set, wherein the second fault oil chromatographic data set comprises 99% of information content in the first fault oil chromatographic data set, and the second fault oil chromatographic data set is adjusted according to the following conditions that: dividing the proportion of 0.2 into a training set and a testing set, establishing a parameter-adjusting and optimizing random forest classification model, training the random forest classification model by adopting the training set, taking the trained random forest classification model as a first random forest classification model, detecting the fault diagnosis accuracy of the first random forest classification model by adopting the testing set, taking the first random forest classification model as a final random forest classification model when the fault diagnosis accuracy of the first random forest classification model is more than or equal to the set fault diagnosis accuracy, inputting newly detected transformer oil chromatographic data into the final random forest classification model for fault diagnosis, and obtaining the diagnosis result whether the transformer has faults or not. The dimension of fault oil chromatographic data is improved by a ratio dimension increasing method, the relevance among dimension characteristics is eliminated by a principal component analysis model, the first fault oil chromatographic data is preferably adjusted to the remaining 8 main characteristics, and the information content contained in the fault oil chromatographic data set is fully mined by combining the ratio dimension increasing method and the principal component analysis model. And a training set in a second fault oil chromatography data set with fully excavated information content is input into a random forest classification model, and the random forest classification model is trained, so that the accuracy of judging whether the power transformer has faults and fault types by the random forest classification model is improved, and the accuracy of judging whether the power transformer has faults and fault types by a transformer fault diagnosis method based on principal component analysis and random forest fusion is further improved.

Since the above embodiments are all described by referring to and combining with other embodiments, the same portions are provided between different embodiments, and the same and similar portions between the various embodiments in this specification may be referred to each other. And will not be described in detail herein.

It is noted that, in this specification, relational terms such as "first" and "second," and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a circuit structure, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such circuit structure, article, or apparatus. Without further limitation, the presence of an element identified by the phrase "comprising an … …" does not exclude the presence of other like elements in a circuit structure, article or device comprising the element.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

The above-described embodiments of the present application do not limit the scope of the present application.

Claims

1. A transformer fault diagnosis method based on principal component analysis and random forest fusion is characterized by comprising the following steps:

detecting oil chromatography of the transformer with the definite fault type, extracting fault oil chromatography data of the detected fault transformer, and integrating the fault oil chromatography data into a fault oil chromatography data set;

establishing a parameter-adjusting and optimal-selecting random forest classification model, training the random forest classification model by adopting the training set, taking the trained random forest classification model as a first random forest classification model, detecting the fault diagnosis accuracy of the first random forest classification model by adopting the test set, and taking the first random forest classification model as a final random forest classification model when the fault diagnosis accuracy of the first random forest classification model is more than or equal to the set fault diagnosis accuracy;

and inputting newly detected transformer oil chromatographic data into the final random forest classification model for fault diagnosis to obtain a diagnosis result of whether the transformer has faults or not.

2. The transformer fault diagnosis method based on principal component analysis and stochastic forest phase fusion according to claim 1, wherein the establishing of a principal component analysis model, the eliminating of the correlation among the dimensional features by the principal component analysis model, the adjusting and optimizing of the first fault oil chromatographic data set to the remaining 8 main features, and the adjusting and optimizing of the first fault oil chromatographic data set to the remaining 8 main features are taken as a second fault oil chromatographic data set, and the second fault oil chromatographic data set contains 99% of information content in the first fault oil chromatographic data set, and comprises the following steps:

inputting a sample set X ═ X of an n-dimensional space¹,x²,…,x^mWherein x isⁱ∈x^mAnd mapped to k-dimensional space;

preprocessing, the formula for changing the sample mean value to 0 is:

preprocessing, the formula for changing the sample variance to 1 is as follows:

xⁱ＝xⁱ/σ

the maximum k characteristic values and k characteristic vectors corresponding to the characteristic values are obtained and are marked as omega¹，ω²，···，ω^k

Output projection matrix W ═ ω¹,ω²…,ω^kWhere ω is^k∈RⁿIn the parameter adjustment and optimization process, the dimension number is the same as the number of main components, and the minimum k is selected as the number k of the main components, so that the formula for retaining the difference of 99% of the original data is as follows:

representing the matrix after dimensionality reduction to k-dimension, the numerator representing the space between the original point and the projection pointThe sum of the distances and the smaller the error are, the more completely the data after dimensionality reduction can represent the data before dimensionality reduction, and if the error is less than 0.01, the data after dimensionality reduction can retain 99% of the information.

3. The transformer fault diagnosis method based on principal component analysis and random forest fusion as claimed in claim 1, wherein the establishing of the parameter-adjusting and preferred random forest classification model comprises:

4. The transformer fault diagnosis method based on principal component analysis and random forest phase fusion according to claim 1, wherein the step of performing ratio dimension increasing on the extracted fault oil chromatographic data, the fault oil chromatographic data after the ratio dimension increasing is used as first fault oil chromatographic data, and the fault oil chromatographic data is integrated into a fault oil chromatographic data set, comprises the steps of:

5. The transformer fault diagnosis method based on principal component analysis and random forest fusion as claimed in claim 1, wherein the step of inputting newly detected transformer oil chromatographic data into the final random forest classification model for fault diagnosis to obtain a diagnosis result of whether a fault exists in the transformer comprises the steps of:

and when the diagnosis result is that the transformer has a fault, the type and the position of the fault can be obtained.