CN111860658A - Transformer fault diagnosis method based on cost sensitivity and integrated learning - Google Patents
Transformer fault diagnosis method based on cost sensitivity and integrated learning Download PDFInfo
- Publication number
- CN111860658A CN111860658A CN202010721965.5A CN202010721965A CN111860658A CN 111860658 A CN111860658 A CN 111860658A CN 202010721965 A CN202010721965 A CN 202010721965A CN 111860658 A CN111860658 A CN 111860658A
- Authority
- CN
- China
- Prior art keywords
- cost
- learning
- weight
- fault diagnosis
- transformer fault
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000003745 diagnosis Methods 0.000 title claims abstract description 31
- 230000035945 sensitivity Effects 0.000 title claims abstract description 14
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 42
- 238000012549 training Methods 0.000 claims abstract description 30
- 238000009826 distribution Methods 0.000 claims abstract description 21
- 238000012360 testing method Methods 0.000 claims abstract description 10
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 239000011159 matrix material Substances 0.000 claims description 24
- 238000004364 calculation method Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 description 6
- 238000003066 decision tree Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 239000013585 weight reducing agent Substances 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000007635 classification algorithm Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000012706 support-vector machine Methods 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000010410 layer Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000013021 overheating Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24317—Piecewise classification, i.e. whereby each classification requires several discriminant rules
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/20—Administration of product repair or maintenance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/259—Fusion by voting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Human Resources & Organizations (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Primary Health Care (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention discloses a transformer fault diagnosis method based on cost sensitivity and integrated learning, which comprises the following steps: preprocessing fault data of various transformers, and dividing the fault data into a training sample set and a test set; establishing a transformer fault diagnosis model based on an AdaCost algorithm; using a distribution weight of DtTraining the training sample set to obtain a weak learner ht(x) (ii) a Calculate ht(x) And the weight occupied in forming the strong classifier; introducing cost factors and updating the weight distribution of each sample in the training sample set; repeating iteration until the learning error rate meets the iteration times required by the error rate, and forming a strong learner; and inputting the test set into the strong learner, and voting to determine the fault type. The transformer based on cost sensitivity and integrated learning provided by the inventionThe fault diagnosis method is based on the AdaCost algorithm, solves the problem of low overall precision of the classifier under the unbalanced data set, and further improves the fault judgment accuracy.
Description
Technical Field
The invention relates to the technical field of power equipment fault detection, in particular to a transformer fault diagnosis method based on cost sensitivity and integrated learning.
Background
The deep mining and analysis of the big data of the power equipment by using the artificial intelligence technology such as machine learning and the like is the trend of the intelligent operation and maintenance field. The power transformer is one of important electrical equipment in a power system, and the operation state of the power transformer is mastered, so that the operation maintenance level of the power transformer is improved, and the safe operation of a power grid is ensured. The abnormal state samples of the power transformer are few, and meanwhile, the problems of missing, imperfection and the like exist in the information of the fault case and the abnormal sample, so that the distribution of the number of the classes of the sample data set of the transformer is unbalanced. Although classification models such as Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs) have good effects in transformer fault diagnosis, for an unbalanced sample set of a transformer, since the loss value is the minimum or the class interval is the maximum, the class interval surface moves in the direction of sparsely distributed samples in a class, the rate of missing judgment of a fault sample is far higher than that of a normal sample, and the classification accuracy of the fault sample cannot be guaranteed, which brings significant losses to a power system, even social economy and life.
The category quantity distribution of the unbalanced data set is extremely unbalanced, the problems of over-fitting, under-fitting and the like can occur when the machine learning model carries out analysis and prediction of classification tasks, and the accuracy and the robustness of the machine learning model are greatly reduced. The study of unbalanced data sets is a focus and problem in the field of machine learning. At present, the researchers in the industry have conducted a lot of research aiming at improving the classification performance of a few classes of samples, and the proposed method is mainly summarized into 2 layers of algorithm and data.
The data plane mainly includes undersampling and oversampling. The essence is to achieve sample equalization by adding few classes of samples or subtracting most classes of samples. The processing method of the non-equilibrium data set mainly comprises 4 types of random oversampling, random undersampling, equilibrium sampling and few-class oversampling synthesis. However, it is possible to use a single-layer,
the algorithm level mainly takes a cost sensitive method as a main part, and is widely applied to a plurality of fields such as image recognition, medical diagnosis and credit rating at present. The cost-sensitive learning method mainly comprises the following 3 implementation modes:
(1) from the study model, the improvement of a specific study method is focused on, so that the method can adapt to study under unbalanced data, such as a perception machine, a support vector machine, a decision tree, a neural network and the like, which respectively have cost-sensitive versions. Taking a cost-sensitive decision tree as an example, the method can be improved from 3 aspects to adapt to learning of unbalanced data, wherein the 3 aspects are decision threshold selection, splitting standard selection and pruning respectively, and adaptability of a realization model to an unbalanced data set is introduced by introducing a cost matrix into the method.
(2) Based on Bayes risk theory, the cost sensitive learning is regarded as the post-processing of the classification result, and a model is learned according to the traditional method to adjust the result with the aim of realizing the minimum loss.
(3) From the perspective of preprocessing, the cost is used for adjusting the weight, so that the classifier meets the characteristic of cost sensitivity, namely, the classifier pays more attention to the sample by improving the weight corresponding to the high-cost misclassified sample in the training process of the classifier. The algorithm represented by it is the AdaCost algorithm based on ensemble learning.
In many discussions utilizing cost-sensitive algorithms, the misdiagnosis cost among the elements in the cost matrix, namely categories, is often given by domain expert comprehensive domain knowledge and has certain subjectivity. In the actual transformer fault diagnosis, misdiagnosis cost among faults is difficult to accurately give, domain experts are required to synthesize domain knowledge and repeated tests, and factors such as fault severity and fault properties are comprehensively considered; meanwhile, the cost matrix determined by expert scoring inevitably has strong subjectivity.
Disclosure of Invention
The invention aims to provide a transformer fault diagnosis method based on cost sensitivity and ensemble learning, which is based on an AdaCost algorithm, improves the weight of a high-cost error classification sample, reduces the weight of a high-cost correct classification sample, solves the problem of low overall precision of a classifier under an unbalanced data set, and further improves the fault judgment accuracy.
In order to achieve the purpose, the invention provides the following scheme:
a transformer fault diagnosis method based on cost sensitivity and integrated learning comprises the following steps:
s1, preprocessing various transformer fault data, and dividing the transformer fault data into a training sample set and a test set;
s2, establishing a transformer fault diagnosis model based on AdaCost algorithm, and enabling a training sample set to be X { (X)1,y1),(x2,y2),…,(xm,ym) In which xiCharacteristic vector of sample formed by gas dissolved in oil, yiAs a fault type label, xi∈X,yiE.g. Y { +1, -1 }; let the number of iterations be T, T-1, 2, …, T; let the sample weight distribution of the t-th iteration be Dt=(wt1,wt2,…,wti) I is 1,2, …, m, andlet the weak learner formed by the t-th iteration be ht(x);
S3, using distribution weight as DtTraining the training sample set to obtain a weak learner ht(x);
S4, calculating ht(x) Learning error rate e oftT ═ 1,2, …, T, where i (x) is the error function;
s5, calculating ht(x) Weight alpha occupied in forming strong classifiert,t=1,2,…,T;
S6, introducing cost factors and updating the weight distribution of each sample in the training sample set;
s7, taking 1,2, … and T in sequence, repeating iteration until the learning error rate meets the iteration times T required by the error rate, and integrating all weak learners by combining strategies to form a strong learner;
and S8, inputting the test set into the strong learner, and voting to determine the fault type.
Optionally, in step S4, h is calculatedt(x) Learning error rate e oftT is 1,2, …, and specifically includes: the calculation formula is as follows:
optionally, in step S5, h is calculatedt(x) Weight alpha occupied in forming strong classifiertT is 1,2, …, and specifically includes: the calculation formula is as follows:
optionally, in step S6, a cost factor is introduced, and the weight distribution of each sample in the training sample set is updated, which specifically includes:
wherein, betaiIs a penalty factor, which is obtained by a cost matrix; ztIs a normalization factor, ensures that the sum of the weight distributions of each sample is 1, and has the following calculation formula:
optionally, in step S7, T, 1,2, …, T are sequentially taken, iteration is repeated until the learning error rate meets the iteration number T required by the error rate, and all weak learners are integrated by combining a strategy to form a strong learner, which specifically includes:
the strong classifier h (x) is represented as follows:
according to the specific embodiment provided by the invention, the invention discloses the following technical effects: the transformer fault diagnosis method based on cost sensitivity and integrated learning provided by the invention is characterized in that a transformer fault diagnosis model is constructed based on an AdaCost algorithm, the AdaCost algorithm is improved based on an AdaBoost algorithm, a weight updating strategy of the AdaCoost algorithm is modified, a matrix formed by cost factors beta i and beta i is introduced into dt (x) and is called as a cost matrix, so that the weight of a misclassified sample with high cost is greatly improved, the weight of a correctly classified sample with high cost is properly reduced, the weight reduction of the correctly classified sample with high cost is relatively small, the overall idea is that the weight of the sample with high cost is greatly increased and slowly reduced, and the problem of low overall precision of a classifier under an unbalanced data set is solved; therefore, the AdaCost algorithm considers the cost difference of the misclassification, can well process the unbalance problem of the transformer fault data set, and improves the identification capability of the classification algorithm on fault samples and the overall classification accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic diagram of the boosting algorithm;
FIG. 2 is a flow diagram of an AdaBoost training process;
FIG. 3 is a cost matrix composition diagram;
fig. 4 is a transformer fault diagnosis model based on AdaCost.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a transformer fault diagnosis method based on cost sensitivity and ensemble learning, which is based on an AdaCost algorithm, improves the weight of a high-cost error classification sample, reduces the weight of a high-cost correct classification sample, solves the problem of low overall precision of a classifier under an unbalanced data set, and further improves the fault judgment accuracy.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The AdaCost algorithm is improved based on the AdaBoost algorithm, the AdaBoost algorithm is a tree integration algorithm based on a boost method, and the core strategy is repeated iteration and updating the weights of the samples and the weights of the base classifiers. And iteratively learning a classifier in each round, updating the weights of the samples according to the performance of the current classifier, wherein the updating strategy is that the weights of the correctly classified samples are reduced, the weights of the incorrectly classified samples are increased, the final model is a weighted linear combination of multiple iterative models, and the classifier with more accurate classification can obtain larger weight.
In AdaBoost, for correctly classified samples, the sample weight adjustment coefficient is exp (-alpha t), and the weights are all reduced by the same proportion; for misclassified samples, the sample weight adjustment coefficient is exp (α t), and the weights are all increased by the same proportion. Although for an unbalanced sample set, AdaBoost can pay more attention to samples of a few classes, and offset of the classifier to a majority class is avoided to a certain extent, the AdaBoost reduces or increases the weight in the same proportion, and does not consider cost factors, so that improvement of the classification performance of the model is limited. The AdaCost algorithm modifies a weight updating strategy of the Adaboost algorithm, and the basic idea is to greatly increase the weight of a high-cost misclassified sample, and appropriately reduce the weight of a high-cost correct classified sample to enable the weight reduction to be relatively small. The general idea is that the increase of the sample weight with high cost is slow, and the problem of low overall precision of the classifier under an unbalanced data set is solved.
Since the AdaCost algorithm is improved based on the AdaBoost algorithm, which is a tree integration algorithm based on the boost method, the principle of the boosting algorithm will be described first, as shown in fig. 1. Firstly, averagely distributing initial weight to each sample in an initial training set, and training to obtain a weak learner 1; then, updating the weight distribution of the samples in the training set based on the learning error rate, and further training to obtain a weak learner 2; repeating iteration until the iteration times T meeting the error rate requirement are met; and finally, integrating all weak learners by combining strategies to form a strong learner.
Three elements of the Boosting algorithm:
let the weak learner be fi(x,θi) (ii) a The strong learner is F (x); x is a sample feature quantity, θiTo learn an error rate.
(1) Function model: the weak learner is superposed through a certain combination strategy to obtain a strong learner, which can be marked as formula 1.
(2) An objective function: let some loss function be E { F (x) }, and take the objective function H (x) as the optimization target of the algorithm.
(3) And (3) an optimization algorithm: stepwise optimization of error rate θiThe optimization formula is shown in formula (3).
F in the three elementsi(x,θi) And selecting the E { F (x) } as an exponential loss function to obtain the AdaBoost algorithm. AdaBoost is called adaptive boosting, and is an adaptive iterative algorithm based on the boost idea. Adaboost is adaptive in that the distribution of samples can be changed according to whether the samples are correctly classified, the correctly classified samples are always low in weight, and the misjudged samples are high in weight, namely, the probability that the misjudged samples are selected to enter the next weak classifier is improved.The AdaBoost training process is shown in fig. 2.
AdaCost modifies the weight update strategy of AdaBoost algorithm, at Dt(x) In which a cost factor beta is introducedi,βiThe constructed matrix is called a cost matrix, so that the weight of the high-cost misclassified sample is greatly increased, and the weight of the high-cost correctly classified sample is properly reduced, so that the weight reduction is relatively small. The core of AdaCost lies in the determination of a cost matrix, which is used to describe cost (penalty) information on the data set to be classified, thereby determining how a classifier should be trained when different classification errors result in different penalty strengths. The cost matrix (Costmatrix) is an N-order square matrix, where N represents the number of categories in the data set to be classified, and the cost matrix is specifically configured as shown in fig. 3.
Wherein the matrix element cijRepresenting the cost of misclassifying a sample with a real class i into a class j; each row of elements of the matrix represents the cost of misclassifying the real i-type samples into other types; when i ≠ j, it represents that the algorithm correctly predicts the sample class, and the entry of i ≠ j corresponds to an incorrect classification result. The cost matrix is set according to domain knowledge of the classification task, and cijThe following principles are generally followed in the assignment process of (1):
(1) the cost of misclassification must be greater than the cost of correct classification;
(2) if the prediction is true, there is no cost, i.e. consider cii=0;
(3) The greater the difference in the degree of loss, cijAnd cjiThe larger the difference in value of.
The invention provides a transformer fault diagnosis method based on cost sensitivity and integrated learning, which comprises the following steps:
s1, preprocessing various transformer fault data, and dividing the transformer fault data into a training sample set and a test set;
s2, establishing a transformer fault diagnosis model based on AdaCost algorithm, and enabling a training sample set to be X { (X)1,y1),(x2,y2),…,(xm,ym) Wherein x isiIs composed of gas dissolved in oilCharacteristic vector of this book, yiAs a fault type label, xi∈X,yiE.g. Y { +1, -1 }; let the number of iterations be T, T-1, 2, …, T; let the sample weight distribution of the t-th iteration be Dt=(wt1,wt2,…,wti) I is 1,2, …, m, andlet the weak learner formed by the t-th iteration be ht(x);
S3, using distribution weight as DtTraining the training sample set to obtain a weak learner ht(x);
S4, calculating ht(x) Learning error rate e oftT ═ 1,2, …, T, where i (x) is the error function;
s5, calculating ht(x) Weight alpha occupied in forming strong classifiert,t=1,2,…,T;
S6, introducing cost factors and updating the weight distribution of each sample in the training sample set;
s7, taking 1,2, … and T in sequence, repeating iteration until the learning error rate meets the iteration times T required by the error rate, and integrating all weak learners by combining strategies to form a strong learner;
and S8, inputting the test set into the strong learner, and voting to determine the fault type.
Wherein, in the step S4, h is calculatedt(x) Learning error rate e oftT is 1,2, …, and specifically includes: the calculation formula is as follows:
the step S5, calculating ht(x) Weight alpha occupied in forming strong classifiertT is 1,2, …, and specifically includes: the calculation formula is as follows:
step S6, introducing a cost factor, and updating the weight distribution of each sample in the training sample set, specifically including:
wherein, betaiIs a penalty factor, which is obtained by a cost matrix; ztIs a normalization factor, ensures that the sum of the weight distributions of each sample is 1, and has the following calculation formula:
the step S7, T sequentially takes 1,2, …, T, and iterates repeatedly until the learning error rate meets the iteration number T required by the error rate, and integrates all weak learners by combining strategies to form a strong learner, specifically including:
the strong classifier h (x) is represented as follows:
and (3) verifying the feasibility and effectiveness of AdaCost applied to transformer fault diagnosis by taking the data of dissolved gas in oil of the power transformer as a sample set, and taking macro F1 as an evaluation index of a model and marking as alpha macro-F1. According to the IEC 60599 standard, transformer faults are classified into 6 types of partial discharge, low-energy discharge, high-energy discharge, low-temperature overheat, medium-temperature overheat, and high-temperature overheat. Considering that the transformer oil temperature is increased due to the long-term discharge of the transformer, and a thermal fault is generated, the supplement discharge and overheating are the 7 th fault type.
According to the above, transformer fault diagnosis is a multi-classification problem, so that a one-to-one strategy is adopted, 28 classifiers are constructed for 8 states (normal, partial discharge, low-energy discharge, high-energy discharge, low-temperature overheat, medium-temperature overheat, high-temperature overheat and discharge and overheat) of a transformer to train, samples are input into the classifiers during testing, and the fault types are voted and determined. The AdaCost-based transformer fault diagnosis model is shown in fig. 4.
The cost matrix formed by expert scoring has strong subjectivity, and the confusion matrix reflects the number of misjudged samples (the number of misjudged samples is considered as high in weight) and derives the cost matrix N on the basis of the confusion matrix obtained by decision tree training, considering that the function of beta i is to improve the weight of the misclassified samples with high cost, and properly reduce the weight of the samples with high cost for correctly classified samples with high cost.
And selecting an evaluation index suitable for the classification of the unbalanced data, and taking the integral classification accuracy macro F1 as the evaluation index of the classifier, namely alpha macro-F1. Compared with the decision tree model (alpha macro-F1 is 0.7152), the overall classification accuracy of the AdaCost model, alpha macro-F1, is improved by 12.1% and advanced by 16.91% compared with the transformer fault diagnosis effect of the decision tree model and the AdaCost model. The method is proved to be capable of well processing the unbalance problem of the transformer fault data set, and the identification capability of the classification algorithm on the fault sample and the integral classification accuracy are improved. Meanwhile, the AdaCost algorithm considers the cost difference of misclassification and accords with the practical engineering significance.
The transformer fault diagnosis method based on cost sensitivity and integrated learning provided by the invention is characterized in that a transformer fault diagnosis model is constructed based on an AdaCost algorithm, the AdaCost algorithm is improved based on an AdaBoost algorithm, a weight updating strategy of the AdaCoost algorithm is modified, a matrix formed by cost factors beta i and beta i is introduced into dt (x) and is called as a cost matrix, so that the weight of a misclassified sample with high cost is greatly improved, the weight of a correctly classified sample with high cost is properly reduced, the weight reduction of the correctly classified sample with high cost is relatively small, the overall idea is that the weight of the sample with high cost is greatly increased and slowly reduced, and the problem of low overall precision of a classifier under an unbalanced data set is solved; therefore, the AdaCost algorithm considers the cost difference of the misclassification, can well process the unbalance problem of the transformer fault data set, and improves the identification capability of the classification algorithm on fault samples and the overall classification accuracy. The AdaCost algorithm is not only suitable for fault diagnosis, but also suitable for other classification application fields under the condition of unbalanced data, and has strong universality and generalization.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.
Claims (5)
1. A transformer fault diagnosis method based on cost sensitivity and integrated learning is characterized by comprising the following steps:
s1, preprocessing various transformer fault data, and dividing the transformer fault data into a training sample set and a test set;
s2, establishing a transformer fault diagnosis model based on AdaCost algorithm, and enabling a training sample set to be X { (X)1,y1),(x2,y2),…,(xm,ym) In which xiCharacteristic vector of sample formed by gas dissolved in oil, yiAs a fault type label, xi∈X,yiE.g. Y { +1, -1 }; let the number of iterations be T, T-1, 2, …, T; let the sample weight distribution of the t-th iteration be Dt=(wt1,wt2,…,wti) I is 1,2, …, m, andlet the weak learner formed by the t-th iteration be ht(x);
S3, using distribution weight as DtTraining the training sample set to obtain a weak learner ht(x);
S4, calculating ht(x) Learning error rate e oftT ═ 1,2, …, T, where i (x) is the error function;
s5, calculating ht(x) Weight alpha occupied in forming strong classifiert,t=1,2,…,T;
S6, introducing cost factors and updating the weight distribution of each sample in the training sample set;
s7, taking 1,2, … and T in sequence, repeating iteration until the learning error rate meets the iteration times T required by the error rate, and integrating all weak learners by combining strategies to form a strong learner;
and S8, inputting the test set into the strong learner, and voting to determine the fault type.
4. the transformer fault diagnosis method based on cost sensitivity and ensemble learning of claim 1, wherein the step S6 introduces cost factors and updates the weight distribution of each sample in the training sample set, which specifically includes:
wherein, betaiIs a penalty factor, which is obtained by a cost matrix; ztIs a normalization factor, ensures that the sum of the weight distributions of each sample is 1, and has the following calculation formula:
5. the transformer fault diagnosis method based on cost-sensitive and ensemble learning of claim 1, wherein the step S7, T sequentially takes 1,2, …, T, and iterates repeatedly until the learning error rate meets the iteration number T of the error rate requirement, and all weak learners are integrated by combining strategies to form a strong learner, specifically comprising:
the strong classifier h (x) is represented as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010721965.5A CN111860658A (en) | 2020-07-24 | 2020-07-24 | Transformer fault diagnosis method based on cost sensitivity and integrated learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010721965.5A CN111860658A (en) | 2020-07-24 | 2020-07-24 | Transformer fault diagnosis method based on cost sensitivity and integrated learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111860658A true CN111860658A (en) | 2020-10-30 |
Family
ID=72950503
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010721965.5A Pending CN111860658A (en) | 2020-07-24 | 2020-07-24 | Transformer fault diagnosis method based on cost sensitivity and integrated learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111860658A (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112308146A (en) * | 2020-11-02 | 2021-02-02 | 国网福建省电力有限公司 | Distribution transformer fault identification method based on operation characteristics |
CN112580715A (en) * | 2020-12-16 | 2021-03-30 | 珠海格力电器股份有限公司 | Household equipment fault detection method, device, equipment and medium |
CN112633900A (en) * | 2020-12-16 | 2021-04-09 | 北京国电通网络技术有限公司 | Industrial Internet of things data verification method based on machine learning |
CN112733913A (en) * | 2020-12-31 | 2021-04-30 | 浙江禾连网络科技有限公司 | Child and old person cooperative property safety detection method based on cost Adaboost algorithm |
CN113419519A (en) * | 2021-07-14 | 2021-09-21 | 北京航空航天大学 | Electromechanical product system or equipment real-time fault diagnosis method based on width learning |
CN113569957A (en) * | 2021-07-29 | 2021-10-29 | 中国工商银行股份有限公司 | Object type identification method and device of business object and storage medium |
CN113610148A (en) * | 2021-08-04 | 2021-11-05 | 北京化工大学 | Fault diagnosis method based on bias weighting AdaBoost |
CN113702728A (en) * | 2021-07-12 | 2021-11-26 | 广东工业大学 | Transformer fault diagnosis method and system based on combined sampling and LightGBM |
CN113780394A (en) * | 2021-08-31 | 2021-12-10 | 厦门理工学院 | Training method, device and equipment of strong classifier model |
CN113935440A (en) * | 2021-12-15 | 2022-01-14 | 武汉格蓝若智能技术有限公司 | Iterative evaluation method and system for error state of voltage transformer |
CN114511399A (en) * | 2022-02-15 | 2022-05-17 | 电子科技大学 | Abnormal data screening method for internet financial wind control |
CN114548306A (en) * | 2022-02-28 | 2022-05-27 | 西南石油大学 | Intelligent monitoring method for early drilling overflow based on misclassification cost |
CN114722923A (en) * | 2022-03-22 | 2022-07-08 | 西北工业大学 | Light electromechanical equipment fault diagnosis method |
CN114997063A (en) * | 2022-06-17 | 2022-09-02 | 华北电力大学 | Power grid transient stability prediction method and system based on cost sensitive support vector machine |
CN115407753A (en) * | 2022-08-18 | 2022-11-29 | 广东元梦泽技术服务有限公司 | Industrial fault diagnosis method for multivariate weighted ensemble learning |
CN117786538A (en) * | 2023-12-06 | 2024-03-29 | 国网上海市电力公司 | CsAdaBoost integrated learning algorithm based on cost sensitivity improvement |
JP7511496B2 (en) | 2021-01-27 | 2024-07-05 | 株式会社日立製作所 | Incremental reinforcement learning apparatus and method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105930861A (en) * | 2016-04-13 | 2016-09-07 | 西安西拓电气股份有限公司 | Adaboost algorithm based transformer fault diagnosis method |
CN108304884A (en) * | 2018-02-23 | 2018-07-20 | 华东理工大学 | A kind of cost-sensitive stacking integrated study frame of feature based inverse mapping |
CN109376801A (en) * | 2018-12-04 | 2019-02-22 | 西安电子科技大学 | Blade of wind-driven generator icing diagnostic method based on integrated deep neural network |
-
2020
- 2020-07-24 CN CN202010721965.5A patent/CN111860658A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105930861A (en) * | 2016-04-13 | 2016-09-07 | 西安西拓电气股份有限公司 | Adaboost algorithm based transformer fault diagnosis method |
CN108304884A (en) * | 2018-02-23 | 2018-07-20 | 华东理工大学 | A kind of cost-sensitive stacking integrated study frame of feature based inverse mapping |
CN109376801A (en) * | 2018-12-04 | 2019-02-22 | 西安电子科技大学 | Blade of wind-driven generator icing diagnostic method based on integrated deep neural network |
Non-Patent Citations (4)
Title |
---|
WEI FAN等: "AdaCost: Misclassification Cost-Sensitive Boosting", 《 PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING》 * |
崔宇等: "考虑不平衡案例样本的电力变压器故障诊断方法", 高电压技术 * |
李勇等: "不平衡数据的集成分类算法综述", 《计算机应用研究》 * |
谭浩;田爱奎;吴志勇;: "一种针对类别不平衡的代价敏感集成算法", 山东理工大学学报(自然科学版) * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112308146A (en) * | 2020-11-02 | 2021-02-02 | 国网福建省电力有限公司 | Distribution transformer fault identification method based on operation characteristics |
CN112580715A (en) * | 2020-12-16 | 2021-03-30 | 珠海格力电器股份有限公司 | Household equipment fault detection method, device, equipment and medium |
CN112633900A (en) * | 2020-12-16 | 2021-04-09 | 北京国电通网络技术有限公司 | Industrial Internet of things data verification method based on machine learning |
CN112580715B (en) * | 2020-12-16 | 2024-05-07 | 珠海格力电器股份有限公司 | Household equipment fault detection method, device, equipment and medium |
CN112733913A (en) * | 2020-12-31 | 2021-04-30 | 浙江禾连网络科技有限公司 | Child and old person cooperative property safety detection method based on cost Adaboost algorithm |
JP7511496B2 (en) | 2021-01-27 | 2024-07-05 | 株式会社日立製作所 | Incremental reinforcement learning apparatus and method |
CN113702728A (en) * | 2021-07-12 | 2021-11-26 | 广东工业大学 | Transformer fault diagnosis method and system based on combined sampling and LightGBM |
CN113419519B (en) * | 2021-07-14 | 2022-05-13 | 北京航空航天大学 | Electromechanical product system or equipment real-time fault diagnosis method based on width learning |
CN113419519A (en) * | 2021-07-14 | 2021-09-21 | 北京航空航天大学 | Electromechanical product system or equipment real-time fault diagnosis method based on width learning |
CN113569957A (en) * | 2021-07-29 | 2021-10-29 | 中国工商银行股份有限公司 | Object type identification method and device of business object and storage medium |
CN113610148B (en) * | 2021-08-04 | 2024-02-02 | 北京化工大学 | Fault diagnosis method based on bias weighted AdaBoost |
CN113610148A (en) * | 2021-08-04 | 2021-11-05 | 北京化工大学 | Fault diagnosis method based on bias weighting AdaBoost |
CN113780394A (en) * | 2021-08-31 | 2021-12-10 | 厦门理工学院 | Training method, device and equipment of strong classifier model |
CN113780394B (en) * | 2021-08-31 | 2023-05-09 | 厦门理工学院 | Training method, device and equipment for strong classifier model |
CN113935440A (en) * | 2021-12-15 | 2022-01-14 | 武汉格蓝若智能技术有限公司 | Iterative evaluation method and system for error state of voltage transformer |
CN114511399A (en) * | 2022-02-15 | 2022-05-17 | 电子科技大学 | Abnormal data screening method for internet financial wind control |
CN114511399B (en) * | 2022-02-15 | 2023-12-15 | 电子科技大学 | Abnormal data identification and elimination method |
CN114548306A (en) * | 2022-02-28 | 2022-05-27 | 西南石油大学 | Intelligent monitoring method for early drilling overflow based on misclassification cost |
CN114722923B (en) * | 2022-03-22 | 2024-02-27 | 西北工业大学 | Lightweight electromechanical equipment fault diagnosis method |
CN114722923A (en) * | 2022-03-22 | 2022-07-08 | 西北工业大学 | Light electromechanical equipment fault diagnosis method |
CN114997063A (en) * | 2022-06-17 | 2022-09-02 | 华北电力大学 | Power grid transient stability prediction method and system based on cost sensitive support vector machine |
CN115407753B (en) * | 2022-08-18 | 2024-02-09 | 广东元梦泽技术服务有限公司 | Industrial fault diagnosis method for multi-variable weighting integrated learning |
CN115407753A (en) * | 2022-08-18 | 2022-11-29 | 广东元梦泽技术服务有限公司 | Industrial fault diagnosis method for multivariate weighted ensemble learning |
CN117786538A (en) * | 2023-12-06 | 2024-03-29 | 国网上海市电力公司 | CsAdaBoost integrated learning algorithm based on cost sensitivity improvement |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111860658A (en) | Transformer fault diagnosis method based on cost sensitivity and integrated learning | |
CN110909926A (en) | TCN-LSTM-based solar photovoltaic power generation prediction method | |
Rao et al. | Dropout and pruned neural networks for fault classification in photovoltaic arrays | |
CN109842373A (en) | Diagnosing failure of photovoltaic array method and device based on spatial and temporal distributions characteristic | |
CN105354595A (en) | Robust visual image classification method and system | |
Li et al. | Neighborhood collective estimation for noisy label identification and correction | |
CN111460001B (en) | Power distribution network theoretical line loss rate evaluation method and system | |
CN106156805A (en) | A kind of classifier training method of sample label missing data | |
CN113591988B (en) | Knowledge cognitive structure analysis method, system, computer equipment, medium and terminal | |
CN115392387B (en) | Low-voltage distributed photovoltaic power generation output prediction method | |
CN111612262A (en) | Wind power probability prediction method based on quantile regression | |
CN113111592A (en) | Short-term wind power prediction method based on EMD-LSTM | |
CN106569954A (en) | Method based on KL divergence for predicting multi-source software defects | |
CN118151020B (en) | Method and system for detecting safety performance of battery | |
CN112633556A (en) | Short-term power load prediction method based on hybrid model | |
CN116031879A (en) | Hybrid intelligent feature selection method suitable for transient voltage stability evaluation of power system | |
CN113536662B (en) | Electronic transformer error state prediction method based on firefly optimized LightGBM algorithm | |
Mohammad et al. | Short term load forecasting using deep neural networks | |
CN114372558A (en) | Residential electricity load prediction method, medium and equipment based on multi-model fusion | |
CN117216692B (en) | Training result acceptance method and system | |
CN118017482A (en) | Flexible climbing capacity demand analysis method based on prediction error feature extraction | |
CN103559510B (en) | Method for recognizing social group behaviors through related topic model | |
CN108038518A (en) | A kind of photovoltaic generation power based on meteorological data determines method and system | |
CN115712574A (en) | Test case generation method for artificial intelligence component | |
Feng et al. | Occlusion-perturbed deep learning for probabilistic solar forecasting via sky images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20201030 |