CN111860658A - Transformer fault diagnosis method based on cost sensitivity and integrated learning - Google Patents

Transformer fault diagnosis method based on cost sensitivity and integrated learning Download PDF

Info

Publication number
CN111860658A
CN111860658A CN202010721965.5A CN202010721965A CN111860658A CN 111860658 A CN111860658 A CN 111860658A CN 202010721965 A CN202010721965 A CN 202010721965A CN 111860658 A CN111860658 A CN 111860658A
Authority
CN
China
Prior art keywords
cost
learning
weight
fault diagnosis
transformer fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010721965.5A
Other languages
Chinese (zh)
Inventor
刘云鹏
和家慧
刘一瑾
王权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Electric Power University
Original Assignee
North China Electric Power University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Electric Power University filed Critical North China Electric Power University
Priority to CN202010721965.5A priority Critical patent/CN111860658A/en
Publication of CN111860658A publication Critical patent/CN111860658A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24317Piecewise classification, i.e. whereby each classification requires several discriminant rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/259Fusion by voting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Resources & Organizations (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses a transformer fault diagnosis method based on cost sensitivity and integrated learning, which comprises the following steps: preprocessing fault data of various transformers, and dividing the fault data into a training sample set and a test set; establishing a transformer fault diagnosis model based on an AdaCost algorithm; using a distribution weight of DtTraining the training sample set to obtain a weak learner ht(x) (ii) a Calculate ht(x) And the weight occupied in forming the strong classifier; introducing cost factors and updating the weight distribution of each sample in the training sample set; repeating iteration until the learning error rate meets the iteration times required by the error rate, and forming a strong learner; and inputting the test set into the strong learner, and voting to determine the fault type. The transformer based on cost sensitivity and integrated learning provided by the inventionThe fault diagnosis method is based on the AdaCost algorithm, solves the problem of low overall precision of the classifier under the unbalanced data set, and further improves the fault judgment accuracy.

Description

Transformer fault diagnosis method based on cost sensitivity and integrated learning
Technical Field
The invention relates to the technical field of power equipment fault detection, in particular to a transformer fault diagnosis method based on cost sensitivity and integrated learning.
Background
The deep mining and analysis of the big data of the power equipment by using the artificial intelligence technology such as machine learning and the like is the trend of the intelligent operation and maintenance field. The power transformer is one of important electrical equipment in a power system, and the operation state of the power transformer is mastered, so that the operation maintenance level of the power transformer is improved, and the safe operation of a power grid is ensured. The abnormal state samples of the power transformer are few, and meanwhile, the problems of missing, imperfection and the like exist in the information of the fault case and the abnormal sample, so that the distribution of the number of the classes of the sample data set of the transformer is unbalanced. Although classification models such as Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs) have good effects in transformer fault diagnosis, for an unbalanced sample set of a transformer, since the loss value is the minimum or the class interval is the maximum, the class interval surface moves in the direction of sparsely distributed samples in a class, the rate of missing judgment of a fault sample is far higher than that of a normal sample, and the classification accuracy of the fault sample cannot be guaranteed, which brings significant losses to a power system, even social economy and life.
The category quantity distribution of the unbalanced data set is extremely unbalanced, the problems of over-fitting, under-fitting and the like can occur when the machine learning model carries out analysis and prediction of classification tasks, and the accuracy and the robustness of the machine learning model are greatly reduced. The study of unbalanced data sets is a focus and problem in the field of machine learning. At present, the researchers in the industry have conducted a lot of research aiming at improving the classification performance of a few classes of samples, and the proposed method is mainly summarized into 2 layers of algorithm and data.
The data plane mainly includes undersampling and oversampling. The essence is to achieve sample equalization by adding few classes of samples or subtracting most classes of samples. The processing method of the non-equilibrium data set mainly comprises 4 types of random oversampling, random undersampling, equilibrium sampling and few-class oversampling synthesis. However, it is possible to use a single-layer,
the algorithm level mainly takes a cost sensitive method as a main part, and is widely applied to a plurality of fields such as image recognition, medical diagnosis and credit rating at present. The cost-sensitive learning method mainly comprises the following 3 implementation modes:
(1) from the study model, the improvement of a specific study method is focused on, so that the method can adapt to study under unbalanced data, such as a perception machine, a support vector machine, a decision tree, a neural network and the like, which respectively have cost-sensitive versions. Taking a cost-sensitive decision tree as an example, the method can be improved from 3 aspects to adapt to learning of unbalanced data, wherein the 3 aspects are decision threshold selection, splitting standard selection and pruning respectively, and adaptability of a realization model to an unbalanced data set is introduced by introducing a cost matrix into the method.
(2) Based on Bayes risk theory, the cost sensitive learning is regarded as the post-processing of the classification result, and a model is learned according to the traditional method to adjust the result with the aim of realizing the minimum loss.
(3) From the perspective of preprocessing, the cost is used for adjusting the weight, so that the classifier meets the characteristic of cost sensitivity, namely, the classifier pays more attention to the sample by improving the weight corresponding to the high-cost misclassified sample in the training process of the classifier. The algorithm represented by it is the AdaCost algorithm based on ensemble learning.
In many discussions utilizing cost-sensitive algorithms, the misdiagnosis cost among the elements in the cost matrix, namely categories, is often given by domain expert comprehensive domain knowledge and has certain subjectivity. In the actual transformer fault diagnosis, misdiagnosis cost among faults is difficult to accurately give, domain experts are required to synthesize domain knowledge and repeated tests, and factors such as fault severity and fault properties are comprehensively considered; meanwhile, the cost matrix determined by expert scoring inevitably has strong subjectivity.
Disclosure of Invention
The invention aims to provide a transformer fault diagnosis method based on cost sensitivity and ensemble learning, which is based on an AdaCost algorithm, improves the weight of a high-cost error classification sample, reduces the weight of a high-cost correct classification sample, solves the problem of low overall precision of a classifier under an unbalanced data set, and further improves the fault judgment accuracy.
In order to achieve the purpose, the invention provides the following scheme:
a transformer fault diagnosis method based on cost sensitivity and integrated learning comprises the following steps:
s1, preprocessing various transformer fault data, and dividing the transformer fault data into a training sample set and a test set;
s2, establishing a transformer fault diagnosis model based on AdaCost algorithm, and enabling a training sample set to be X { (X)1,y1),(x2,y2),…,(xm,ym) In which xiCharacteristic vector of sample formed by gas dissolved in oil, yiAs a fault type label, xi∈X,yiE.g. Y { +1, -1 }; let the number of iterations be T, T-1, 2, …, T; let the sample weight distribution of the t-th iteration be Dt=(wt1,wt2,…,wti) I is 1,2, …, m, and
Figure BDA0002600355100000031
let the weak learner formed by the t-th iteration be ht(x);
S3, using distribution weight as DtTraining the training sample set to obtain a weak learner ht(x);
S4, calculating ht(x) Learning error rate e oftT ═ 1,2, …, T, where i (x) is the error function;
s5, calculating ht(x) Weight alpha occupied in forming strong classifiert,t=1,2,…,T;
S6, introducing cost factors and updating the weight distribution of each sample in the training sample set;
s7, taking 1,2, … and T in sequence, repeating iteration until the learning error rate meets the iteration times T required by the error rate, and integrating all weak learners by combining strategies to form a strong learner;
and S8, inputting the test set into the strong learner, and voting to determine the fault type.
Optionally, in step S4, h is calculatedt(x) Learning error rate e oftT is 1,2, …, and specifically includes: the calculation formula is as follows:
Figure BDA0002600355100000032
optionally, in step S5, h is calculatedt(x) Weight alpha occupied in forming strong classifiertT is 1,2, …, and specifically includes: the calculation formula is as follows:
Figure BDA0002600355100000033
optionally, in step S6, a cost factor is introduced, and the weight distribution of each sample in the training sample set is updated, which specifically includes:
Figure BDA0002600355100000034
wherein, betaiIs a penalty factor, which is obtained by a cost matrix; ztIs a normalization factor, ensures that the sum of the weight distributions of each sample is 1, and has the following calculation formula:
Figure BDA0002600355100000035
optionally, in step S7, T, 1,2, …, T are sequentially taken, iteration is repeated until the learning error rate meets the iteration number T required by the error rate, and all weak learners are integrated by combining a strategy to form a strong learner, which specifically includes:
the strong classifier h (x) is represented as follows:
Figure BDA0002600355100000041
according to the specific embodiment provided by the invention, the invention discloses the following technical effects: the transformer fault diagnosis method based on cost sensitivity and integrated learning provided by the invention is characterized in that a transformer fault diagnosis model is constructed based on an AdaCost algorithm, the AdaCost algorithm is improved based on an AdaBoost algorithm, a weight updating strategy of the AdaCoost algorithm is modified, a matrix formed by cost factors beta i and beta i is introduced into dt (x) and is called as a cost matrix, so that the weight of a misclassified sample with high cost is greatly improved, the weight of a correctly classified sample with high cost is properly reduced, the weight reduction of the correctly classified sample with high cost is relatively small, the overall idea is that the weight of the sample with high cost is greatly increased and slowly reduced, and the problem of low overall precision of a classifier under an unbalanced data set is solved; therefore, the AdaCost algorithm considers the cost difference of the misclassification, can well process the unbalance problem of the transformer fault data set, and improves the identification capability of the classification algorithm on fault samples and the overall classification accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic diagram of the boosting algorithm;
FIG. 2 is a flow diagram of an AdaBoost training process;
FIG. 3 is a cost matrix composition diagram;
fig. 4 is a transformer fault diagnosis model based on AdaCost.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a transformer fault diagnosis method based on cost sensitivity and ensemble learning, which is based on an AdaCost algorithm, improves the weight of a high-cost error classification sample, reduces the weight of a high-cost correct classification sample, solves the problem of low overall precision of a classifier under an unbalanced data set, and further improves the fault judgment accuracy.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The AdaCost algorithm is improved based on the AdaBoost algorithm, the AdaBoost algorithm is a tree integration algorithm based on a boost method, and the core strategy is repeated iteration and updating the weights of the samples and the weights of the base classifiers. And iteratively learning a classifier in each round, updating the weights of the samples according to the performance of the current classifier, wherein the updating strategy is that the weights of the correctly classified samples are reduced, the weights of the incorrectly classified samples are increased, the final model is a weighted linear combination of multiple iterative models, and the classifier with more accurate classification can obtain larger weight.
In AdaBoost, for correctly classified samples, the sample weight adjustment coefficient is exp (-alpha t), and the weights are all reduced by the same proportion; for misclassified samples, the sample weight adjustment coefficient is exp (α t), and the weights are all increased by the same proportion. Although for an unbalanced sample set, AdaBoost can pay more attention to samples of a few classes, and offset of the classifier to a majority class is avoided to a certain extent, the AdaBoost reduces or increases the weight in the same proportion, and does not consider cost factors, so that improvement of the classification performance of the model is limited. The AdaCost algorithm modifies a weight updating strategy of the Adaboost algorithm, and the basic idea is to greatly increase the weight of a high-cost misclassified sample, and appropriately reduce the weight of a high-cost correct classified sample to enable the weight reduction to be relatively small. The general idea is that the increase of the sample weight with high cost is slow, and the problem of low overall precision of the classifier under an unbalanced data set is solved.
Since the AdaCost algorithm is improved based on the AdaBoost algorithm, which is a tree integration algorithm based on the boost method, the principle of the boosting algorithm will be described first, as shown in fig. 1. Firstly, averagely distributing initial weight to each sample in an initial training set, and training to obtain a weak learner 1; then, updating the weight distribution of the samples in the training set based on the learning error rate, and further training to obtain a weak learner 2; repeating iteration until the iteration times T meeting the error rate requirement are met; and finally, integrating all weak learners by combining strategies to form a strong learner.
Three elements of the Boosting algorithm:
let the weak learner be fi(x,θi) (ii) a The strong learner is F (x); x is a sample feature quantity, θiTo learn an error rate.
(1) Function model: the weak learner is superposed through a certain combination strategy to obtain a strong learner, which can be marked as formula 1.
Figure BDA0002600355100000061
(2) An objective function: let some loss function be E { F (x) }, and take the objective function H (x) as the optimization target of the algorithm.
Figure BDA0002600355100000062
(3) And (3) an optimization algorithm: stepwise optimization of error rate θiThe optimization formula is shown in formula (3).
Figure BDA0002600355100000063
F in the three elementsi(x,θi) And selecting the E { F (x) } as an exponential loss function to obtain the AdaBoost algorithm. AdaBoost is called adaptive boosting, and is an adaptive iterative algorithm based on the boost idea. Adaboost is adaptive in that the distribution of samples can be changed according to whether the samples are correctly classified, the correctly classified samples are always low in weight, and the misjudged samples are high in weight, namely, the probability that the misjudged samples are selected to enter the next weak classifier is improved.The AdaBoost training process is shown in fig. 2.
AdaCost modifies the weight update strategy of AdaBoost algorithm, at Dt(x) In which a cost factor beta is introducedi,βiThe constructed matrix is called a cost matrix, so that the weight of the high-cost misclassified sample is greatly increased, and the weight of the high-cost correctly classified sample is properly reduced, so that the weight reduction is relatively small. The core of AdaCost lies in the determination of a cost matrix, which is used to describe cost (penalty) information on the data set to be classified, thereby determining how a classifier should be trained when different classification errors result in different penalty strengths. The cost matrix (Costmatrix) is an N-order square matrix, where N represents the number of categories in the data set to be classified, and the cost matrix is specifically configured as shown in fig. 3.
Wherein the matrix element cijRepresenting the cost of misclassifying a sample with a real class i into a class j; each row of elements of the matrix represents the cost of misclassifying the real i-type samples into other types; when i ≠ j, it represents that the algorithm correctly predicts the sample class, and the entry of i ≠ j corresponds to an incorrect classification result. The cost matrix is set according to domain knowledge of the classification task, and cijThe following principles are generally followed in the assignment process of (1):
(1) the cost of misclassification must be greater than the cost of correct classification;
(2) if the prediction is true, there is no cost, i.e. consider cii=0;
(3) The greater the difference in the degree of loss, cijAnd cjiThe larger the difference in value of.
The invention provides a transformer fault diagnosis method based on cost sensitivity and integrated learning, which comprises the following steps:
s1, preprocessing various transformer fault data, and dividing the transformer fault data into a training sample set and a test set;
s2, establishing a transformer fault diagnosis model based on AdaCost algorithm, and enabling a training sample set to be X { (X)1,y1),(x2,y2),…,(xm,ym) Wherein x isiIs composed of gas dissolved in oilCharacteristic vector of this book, yiAs a fault type label, xi∈X,yiE.g. Y { +1, -1 }; let the number of iterations be T, T-1, 2, …, T; let the sample weight distribution of the t-th iteration be Dt=(wt1,wt2,…,wti) I is 1,2, …, m, and
Figure BDA0002600355100000071
let the weak learner formed by the t-th iteration be ht(x);
S3, using distribution weight as DtTraining the training sample set to obtain a weak learner ht(x);
S4, calculating ht(x) Learning error rate e oftT ═ 1,2, …, T, where i (x) is the error function;
s5, calculating ht(x) Weight alpha occupied in forming strong classifiert,t=1,2,…,T;
S6, introducing cost factors and updating the weight distribution of each sample in the training sample set;
s7, taking 1,2, … and T in sequence, repeating iteration until the learning error rate meets the iteration times T required by the error rate, and integrating all weak learners by combining strategies to form a strong learner;
and S8, inputting the test set into the strong learner, and voting to determine the fault type.
Wherein, in the step S4, h is calculatedt(x) Learning error rate e oftT is 1,2, …, and specifically includes: the calculation formula is as follows:
Figure BDA0002600355100000072
the step S5, calculating ht(x) Weight alpha occupied in forming strong classifiertT is 1,2, …, and specifically includes: the calculation formula is as follows:
Figure BDA0002600355100000073
step S6, introducing a cost factor, and updating the weight distribution of each sample in the training sample set, specifically including:
Figure BDA0002600355100000081
wherein, betaiIs a penalty factor, which is obtained by a cost matrix; ztIs a normalization factor, ensures that the sum of the weight distributions of each sample is 1, and has the following calculation formula:
Figure BDA0002600355100000082
the step S7, T sequentially takes 1,2, …, T, and iterates repeatedly until the learning error rate meets the iteration number T required by the error rate, and integrates all weak learners by combining strategies to form a strong learner, specifically including:
the strong classifier h (x) is represented as follows:
Figure BDA0002600355100000083
and (3) verifying the feasibility and effectiveness of AdaCost applied to transformer fault diagnosis by taking the data of dissolved gas in oil of the power transformer as a sample set, and taking macro F1 as an evaluation index of a model and marking as alpha macro-F1. According to the IEC 60599 standard, transformer faults are classified into 6 types of partial discharge, low-energy discharge, high-energy discharge, low-temperature overheat, medium-temperature overheat, and high-temperature overheat. Considering that the transformer oil temperature is increased due to the long-term discharge of the transformer, and a thermal fault is generated, the supplement discharge and overheating are the 7 th fault type.
According to the above, transformer fault diagnosis is a multi-classification problem, so that a one-to-one strategy is adopted, 28 classifiers are constructed for 8 states (normal, partial discharge, low-energy discharge, high-energy discharge, low-temperature overheat, medium-temperature overheat, high-temperature overheat and discharge and overheat) of a transformer to train, samples are input into the classifiers during testing, and the fault types are voted and determined. The AdaCost-based transformer fault diagnosis model is shown in fig. 4.
The cost matrix formed by expert scoring has strong subjectivity, and the confusion matrix reflects the number of misjudged samples (the number of misjudged samples is considered as high in weight) and derives the cost matrix N on the basis of the confusion matrix obtained by decision tree training, considering that the function of beta i is to improve the weight of the misclassified samples with high cost, and properly reduce the weight of the samples with high cost for correctly classified samples with high cost.
Figure BDA0002600355100000091
And selecting an evaluation index suitable for the classification of the unbalanced data, and taking the integral classification accuracy macro F1 as the evaluation index of the classifier, namely alpha macro-F1. Compared with the decision tree model (alpha macro-F1 is 0.7152), the overall classification accuracy of the AdaCost model, alpha macro-F1, is improved by 12.1% and advanced by 16.91% compared with the transformer fault diagnosis effect of the decision tree model and the AdaCost model. The method is proved to be capable of well processing the unbalance problem of the transformer fault data set, and the identification capability of the classification algorithm on the fault sample and the integral classification accuracy are improved. Meanwhile, the AdaCost algorithm considers the cost difference of misclassification and accords with the practical engineering significance.
The transformer fault diagnosis method based on cost sensitivity and integrated learning provided by the invention is characterized in that a transformer fault diagnosis model is constructed based on an AdaCost algorithm, the AdaCost algorithm is improved based on an AdaBoost algorithm, a weight updating strategy of the AdaCoost algorithm is modified, a matrix formed by cost factors beta i and beta i is introduced into dt (x) and is called as a cost matrix, so that the weight of a misclassified sample with high cost is greatly improved, the weight of a correctly classified sample with high cost is properly reduced, the weight reduction of the correctly classified sample with high cost is relatively small, the overall idea is that the weight of the sample with high cost is greatly increased and slowly reduced, and the problem of low overall precision of a classifier under an unbalanced data set is solved; therefore, the AdaCost algorithm considers the cost difference of the misclassification, can well process the unbalance problem of the transformer fault data set, and improves the identification capability of the classification algorithm on fault samples and the overall classification accuracy. The AdaCost algorithm is not only suitable for fault diagnosis, but also suitable for other classification application fields under the condition of unbalanced data, and has strong universality and generalization.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (5)

1. A transformer fault diagnosis method based on cost sensitivity and integrated learning is characterized by comprising the following steps:
s1, preprocessing various transformer fault data, and dividing the transformer fault data into a training sample set and a test set;
s2, establishing a transformer fault diagnosis model based on AdaCost algorithm, and enabling a training sample set to be X { (X)1,y1),(x2,y2),…,(xm,ym) In which xiCharacteristic vector of sample formed by gas dissolved in oil, yiAs a fault type label, xi∈X,yiE.g. Y { +1, -1 }; let the number of iterations be T, T-1, 2, …, T; let the sample weight distribution of the t-th iteration be Dt=(wt1,wt2,…,wti) I is 1,2, …, m, and
Figure FDA0002600355090000011
let the weak learner formed by the t-th iteration be ht(x);
S3, using distribution weight as DtTraining the training sample set to obtain a weak learner ht(x);
S4, calculating ht(x) Learning error rate e oftT ═ 1,2, …, T, where i (x) is the error function;
s5, calculating ht(x) Weight alpha occupied in forming strong classifiert,t=1,2,…,T;
S6, introducing cost factors and updating the weight distribution of each sample in the training sample set;
s7, taking 1,2, … and T in sequence, repeating iteration until the learning error rate meets the iteration times T required by the error rate, and integrating all weak learners by combining strategies to form a strong learner;
and S8, inputting the test set into the strong learner, and voting to determine the fault type.
2. The cost-sensitive and ensemble-learning-based transformer fault diagnosis method according to claim 1, wherein the step S4 of calculating ht(x) Learning error rate e oftT is 1,2, …, and specifically includes: the calculation formula is as follows:
Figure FDA0002600355090000012
3. the cost-sensitive and ensemble-learning-based transformer fault diagnosis method according to claim 1, wherein the step S5 of calculating ht(x) Weight alpha occupied in forming strong classifiertT is 1,2, …, and specifically includes: the calculation formula is as follows:
Figure FDA0002600355090000013
4. the transformer fault diagnosis method based on cost sensitivity and ensemble learning of claim 1, wherein the step S6 introduces cost factors and updates the weight distribution of each sample in the training sample set, which specifically includes:
Figure FDA0002600355090000021
wherein, betaiIs a penalty factor, which is obtained by a cost matrix; ztIs a normalization factor, ensures that the sum of the weight distributions of each sample is 1, and has the following calculation formula:
Figure FDA0002600355090000022
5. the transformer fault diagnosis method based on cost-sensitive and ensemble learning of claim 1, wherein the step S7, T sequentially takes 1,2, …, T, and iterates repeatedly until the learning error rate meets the iteration number T of the error rate requirement, and all weak learners are integrated by combining strategies to form a strong learner, specifically comprising:
the strong classifier h (x) is represented as follows:
Figure FDA0002600355090000023
CN202010721965.5A 2020-07-24 2020-07-24 Transformer fault diagnosis method based on cost sensitivity and integrated learning Pending CN111860658A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010721965.5A CN111860658A (en) 2020-07-24 2020-07-24 Transformer fault diagnosis method based on cost sensitivity and integrated learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010721965.5A CN111860658A (en) 2020-07-24 2020-07-24 Transformer fault diagnosis method based on cost sensitivity and integrated learning

Publications (1)

Publication Number Publication Date
CN111860658A true CN111860658A (en) 2020-10-30

Family

ID=72950503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010721965.5A Pending CN111860658A (en) 2020-07-24 2020-07-24 Transformer fault diagnosis method based on cost sensitivity and integrated learning

Country Status (1)

Country Link
CN (1) CN111860658A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308146A (en) * 2020-11-02 2021-02-02 国网福建省电力有限公司 Distribution transformer fault identification method based on operation characteristics
CN112580715A (en) * 2020-12-16 2021-03-30 珠海格力电器股份有限公司 Household equipment fault detection method, device, equipment and medium
CN112633900A (en) * 2020-12-16 2021-04-09 北京国电通网络技术有限公司 Industrial Internet of things data verification method based on machine learning
CN112733913A (en) * 2020-12-31 2021-04-30 浙江禾连网络科技有限公司 Child and old person cooperative property safety detection method based on cost Adaboost algorithm
CN113419519A (en) * 2021-07-14 2021-09-21 北京航空航天大学 Electromechanical product system or equipment real-time fault diagnosis method based on width learning
CN113569957A (en) * 2021-07-29 2021-10-29 中国工商银行股份有限公司 Object type identification method and device of business object and storage medium
CN113610148A (en) * 2021-08-04 2021-11-05 北京化工大学 Fault diagnosis method based on bias weighting AdaBoost
CN113702728A (en) * 2021-07-12 2021-11-26 广东工业大学 Transformer fault diagnosis method and system based on combined sampling and LightGBM
CN113780394A (en) * 2021-08-31 2021-12-10 厦门理工学院 Training method, device and equipment of strong classifier model
CN113935440A (en) * 2021-12-15 2022-01-14 武汉格蓝若智能技术有限公司 Iterative evaluation method and system for error state of voltage transformer
CN114511399A (en) * 2022-02-15 2022-05-17 电子科技大学 Abnormal data screening method for internet financial wind control
CN114548306A (en) * 2022-02-28 2022-05-27 西南石油大学 Intelligent monitoring method for early drilling overflow based on misclassification cost
CN114722923A (en) * 2022-03-22 2022-07-08 西北工业大学 Light electromechanical equipment fault diagnosis method
CN114997063A (en) * 2022-06-17 2022-09-02 华北电力大学 Power grid transient stability prediction method and system based on cost sensitive support vector machine
CN115407753A (en) * 2022-08-18 2022-11-29 广东元梦泽技术服务有限公司 Industrial fault diagnosis method for multivariate weighted ensemble learning
CN117786538A (en) * 2023-12-06 2024-03-29 国网上海市电力公司 CsAdaBoost integrated learning algorithm based on cost sensitivity improvement
JP7511496B2 (en) 2021-01-27 2024-07-05 株式会社日立製作所 Incremental reinforcement learning apparatus and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930861A (en) * 2016-04-13 2016-09-07 西安西拓电气股份有限公司 Adaboost algorithm based transformer fault diagnosis method
CN108304884A (en) * 2018-02-23 2018-07-20 华东理工大学 A kind of cost-sensitive stacking integrated study frame of feature based inverse mapping
CN109376801A (en) * 2018-12-04 2019-02-22 西安电子科技大学 Blade of wind-driven generator icing diagnostic method based on integrated deep neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930861A (en) * 2016-04-13 2016-09-07 西安西拓电气股份有限公司 Adaboost algorithm based transformer fault diagnosis method
CN108304884A (en) * 2018-02-23 2018-07-20 华东理工大学 A kind of cost-sensitive stacking integrated study frame of feature based inverse mapping
CN109376801A (en) * 2018-12-04 2019-02-22 西安电子科技大学 Blade of wind-driven generator icing diagnostic method based on integrated deep neural network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
WEI FAN等: "AdaCost: Misclassification Cost-Sensitive Boosting", 《 PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING》 *
崔宇等: "考虑不平衡案例样本的电力变压器故障诊断方法", 高电压技术 *
李勇等: "不平衡数据的集成分类算法综述", 《计算机应用研究》 *
谭浩;田爱奎;吴志勇;: "一种针对类别不平衡的代价敏感集成算法", 山东理工大学学报(自然科学版) *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308146A (en) * 2020-11-02 2021-02-02 国网福建省电力有限公司 Distribution transformer fault identification method based on operation characteristics
CN112580715A (en) * 2020-12-16 2021-03-30 珠海格力电器股份有限公司 Household equipment fault detection method, device, equipment and medium
CN112633900A (en) * 2020-12-16 2021-04-09 北京国电通网络技术有限公司 Industrial Internet of things data verification method based on machine learning
CN112580715B (en) * 2020-12-16 2024-05-07 珠海格力电器股份有限公司 Household equipment fault detection method, device, equipment and medium
CN112733913A (en) * 2020-12-31 2021-04-30 浙江禾连网络科技有限公司 Child and old person cooperative property safety detection method based on cost Adaboost algorithm
JP7511496B2 (en) 2021-01-27 2024-07-05 株式会社日立製作所 Incremental reinforcement learning apparatus and method
CN113702728A (en) * 2021-07-12 2021-11-26 广东工业大学 Transformer fault diagnosis method and system based on combined sampling and LightGBM
CN113419519B (en) * 2021-07-14 2022-05-13 北京航空航天大学 Electromechanical product system or equipment real-time fault diagnosis method based on width learning
CN113419519A (en) * 2021-07-14 2021-09-21 北京航空航天大学 Electromechanical product system or equipment real-time fault diagnosis method based on width learning
CN113569957A (en) * 2021-07-29 2021-10-29 中国工商银行股份有限公司 Object type identification method and device of business object and storage medium
CN113610148B (en) * 2021-08-04 2024-02-02 北京化工大学 Fault diagnosis method based on bias weighted AdaBoost
CN113610148A (en) * 2021-08-04 2021-11-05 北京化工大学 Fault diagnosis method based on bias weighting AdaBoost
CN113780394A (en) * 2021-08-31 2021-12-10 厦门理工学院 Training method, device and equipment of strong classifier model
CN113780394B (en) * 2021-08-31 2023-05-09 厦门理工学院 Training method, device and equipment for strong classifier model
CN113935440A (en) * 2021-12-15 2022-01-14 武汉格蓝若智能技术有限公司 Iterative evaluation method and system for error state of voltage transformer
CN114511399A (en) * 2022-02-15 2022-05-17 电子科技大学 Abnormal data screening method for internet financial wind control
CN114511399B (en) * 2022-02-15 2023-12-15 电子科技大学 Abnormal data identification and elimination method
CN114548306A (en) * 2022-02-28 2022-05-27 西南石油大学 Intelligent monitoring method for early drilling overflow based on misclassification cost
CN114722923B (en) * 2022-03-22 2024-02-27 西北工业大学 Lightweight electromechanical equipment fault diagnosis method
CN114722923A (en) * 2022-03-22 2022-07-08 西北工业大学 Light electromechanical equipment fault diagnosis method
CN114997063A (en) * 2022-06-17 2022-09-02 华北电力大学 Power grid transient stability prediction method and system based on cost sensitive support vector machine
CN115407753B (en) * 2022-08-18 2024-02-09 广东元梦泽技术服务有限公司 Industrial fault diagnosis method for multi-variable weighting integrated learning
CN115407753A (en) * 2022-08-18 2022-11-29 广东元梦泽技术服务有限公司 Industrial fault diagnosis method for multivariate weighted ensemble learning
CN117786538A (en) * 2023-12-06 2024-03-29 国网上海市电力公司 CsAdaBoost integrated learning algorithm based on cost sensitivity improvement

Similar Documents

Publication Publication Date Title
CN111860658A (en) Transformer fault diagnosis method based on cost sensitivity and integrated learning
CN110909926A (en) TCN-LSTM-based solar photovoltaic power generation prediction method
Rao et al. Dropout and pruned neural networks for fault classification in photovoltaic arrays
CN109842373A (en) Diagnosing failure of photovoltaic array method and device based on spatial and temporal distributions characteristic
CN105354595A (en) Robust visual image classification method and system
Li et al. Neighborhood collective estimation for noisy label identification and correction
CN111460001B (en) Power distribution network theoretical line loss rate evaluation method and system
CN106156805A (en) A kind of classifier training method of sample label missing data
CN113591988B (en) Knowledge cognitive structure analysis method, system, computer equipment, medium and terminal
CN115392387B (en) Low-voltage distributed photovoltaic power generation output prediction method
CN111612262A (en) Wind power probability prediction method based on quantile regression
CN113111592A (en) Short-term wind power prediction method based on EMD-LSTM
CN106569954A (en) Method based on KL divergence for predicting multi-source software defects
CN118151020B (en) Method and system for detecting safety performance of battery
CN112633556A (en) Short-term power load prediction method based on hybrid model
CN116031879A (en) Hybrid intelligent feature selection method suitable for transient voltage stability evaluation of power system
CN113536662B (en) Electronic transformer error state prediction method based on firefly optimized LightGBM algorithm
Mohammad et al. Short term load forecasting using deep neural networks
CN114372558A (en) Residential electricity load prediction method, medium and equipment based on multi-model fusion
CN117216692B (en) Training result acceptance method and system
CN118017482A (en) Flexible climbing capacity demand analysis method based on prediction error feature extraction
CN103559510B (en) Method for recognizing social group behaviors through related topic model
CN108038518A (en) A kind of photovoltaic generation power based on meteorological data determines method and system
CN115712574A (en) Test case generation method for artificial intelligence component
Feng et al. Occlusion-perturbed deep learning for probabilistic solar forecasting via sky images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201030