CN115712871A

CN115712871A - Power electronic system fault diagnosis method combining resampling and integrated learning

Info

Publication number: CN115712871A
Application number: CN202211510375.3A
Authority: CN
Inventors: 苟斌; 邓清丽; 冯晓云; 葛兴来; 林春旭; 杨顺风; 谢东; 王惠民
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-02-24

Abstract

The invention discloses a power electronic system fault diagnosis method combining resampling and integrated learning, which belongs to the technical field of power electronic equipment fault diagnosis and comprises the steps of sampling current data in a half fundamental wave period of a power electronic system in real time, and conducting per-unit processing on the current data to obtain per-unit data; obtaining the frequency domain characteristics of the per-unit data based on a fast Fourier transform algorithm according to the per-unit data; obtaining a feature vector of per-unit data through a feature extraction selector according to the frequency domain features; and according to the characteristic vector of the per-unit data, obtaining a fault category label by using the integrated classification model, and completing fault diagnosis of the power electronic system. The invention solves the technical problems of inaccurate diagnosis and even wrong judgment of the existing machine learning model caused by unbalanced original data samples in the existing fault diagnosis method, and effectively realizes the diagnosis of various different fault types of a sensor and a power device of a power electronic system.

Description

Power electronic system fault diagnosis method combining resampling and integrated learning

Technical Field

The invention belongs to the technical field of power electronic equipment fault diagnosis, and particularly relates to a power electronic system fault diagnosis method combining resampling and integrated learning.

Background

The power electronic converter plays an indispensable role in an energy conversion system, and is widely applied to the fields of photovoltaic power generation, railway electric traction and transportation, battery charging, aerospace systems and the like. Because the power electronic equipment is easy to have faults and has high fault rate, and the power electronic converter is one of common fault sources, accurate fault diagnosis has important significance for fault-tolerant operation control after faults and further maintenance of a system.

With the rapid development of artificial intelligence and data science and technology, many data-driven fault diagnosis methods are proposed to deal with the difficult problems and technical challenges in the fault diagnosis method of the power electronic converter system, such as various fault types, inaccurate mathematical models, and the like. However, in most cases, the intelligent diagnostic model is trained from a satisfactory data set, which not only means that there are enough samples and little noise, but also means that the distribution of samples from different classes is balanced. In fact, the power electronic converter system has few fault states, the original historical monitoring data is unbalanced in lump, and the normal operation data samples are larger than the fault data samples. Because the intelligent learning method pays equal attention to each sample, a few fault samples are easily ignored by the intelligent learning method, and even if the diagnostic model has high training precision, the fault diagnosis performance of the few fault samples is poor, and fault type misjudgment occurs. Therefore, in the fault diagnosis of the power electronic system, aiming at the problem of data imbalance, it is urgently needed to design a classifier, which can improve the diagnosis precision of a few fault samples and does not seriously sacrifice the precision of most normal samples.

Disclosure of Invention

Aiming at the defects in the prior art, the power electronic system fault diagnosis method combining resampling and integrated learning provided by the invention solves the technical problems of inaccurate diagnosis and even wrong judgment of the existing machine learning model caused by unbalanced original data samples in the existing fault diagnosis method, and effectively realizes diagnosis of various different fault types of sensors and power devices of a power electronic system.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that: a power electronic system fault diagnosis method combining resampling and integrated learning comprises the following steps:

s1, sampling current data in a half fundamental wave period of a power electronic system in real time, and conducting per-unit processing on the current data to obtain per-unit data;

s2, obtaining the frequency domain characteristics of the per-unit data based on a fast Fourier transform algorithm according to the per-unit data;

s3, obtaining a feature vector of the per-unit data through a feature extraction selector according to the frequency domain feature of the per-unit data;

and S4, obtaining a fault category label by utilizing an integrated classification model according to the feature vector of the per-unit data, and completing fault diagnosis of the power electronic system.

The invention has the beneficial effects that: the invention effectively utilizes the feature extraction selector to realize the dimension reduction of the feature vector, so that the resampling algorithm obtains better effect under different data balance rates; the accuracy and the classification capability of the classifier under unbalanced data are further improved by the aid of the integrated learning idea; the method can accurately diagnose the faults of the power device and the sensor, realize real-time online identification of the faults, find abnormal problems in the power electronic system and improve the maintenance efficiency.

Further, the expression of the per-unit processing in step S1 is:

x _in ＝x _i /max(x ₁ ,x ₂ ,…x _N )

wherein x is _in Is a per unit value of data, x _i For the true value of the data, max (. Cndot.) is a function of the maximum value, x _N Is the nth data of the data sample.

The beneficial effects of the above further scheme are: the data are unified per unit, the data under different operating conditions and load conditions are unified, and the characteristics and parameters of each element of the power system are easy to compare.

Further, the construction of the feature extraction selector in the step S3 includes the following steps:

a1, acquiring a normal data sample and a fault data sample of a power electronic system, and preprocessing and performing per unit on the normal data sample and the fault data sample to obtain an initial data set;

a2, obtaining frequency domain characteristics of the initial data set based on a fast Fourier transform algorithm according to the initial data set;

a3, according to the frequency domain characteristics of the initial data set, evaluating the correlation between each characteristic attribute and the fault category in the frequency domain characteristics by adopting a characteristic weighting algorithm Relieff to obtain the characteristic weight of each characteristic attribute, wherein the expression of the characteristic weighting algorithm Relieff is as follows:

wherein W (A) is the weight of the A-th feature, A is the feature number, R is the sample data, H _j Is the jth nearest neighbor of R, j is the nearest neighbor number, k is the total number of selected nearest neighbors, diff (A, R, H) _j ) As sample R and sample H _j The difference in characteristic A, p (-) is the prior probability of Class, M is the sample sampling time, class (R) is the Class label to which R belongs, M _j (C) To represent the jth nearest neighbor sample in the class, C is the number of all labels, diff (A, R) ₁ ,R ₂ ) Is a sample R ₁ And sample R ₂ Difference in characteristic A, R ₁ And R ₂ All the sample data refer to symbols, and min (-) is a minimum function;

a4, according to the feature weight of each feature attribute, eliminating the feature attribute with the weight less than zero to obtain a first feature attribute set;

a5, according to the first characteristic attribute set, selecting m characteristic attributes with maximum average mutual information to obtain a characteristic subset of the fault category:

d (S, c) is average mutual information of the feature subset S and the fault category c, S is the feature subset, c is the fault category, I (-) is the mutual information technology measurement result, z _i Is the ith feature attribute, i is the feature attribute number, p (zi) is the marginal probability density function of the ith feature attribute, p (c) is the marginal probability density function of the fault class c, and p (zi, c) is z _i And c, a joint probability density function;

a6, according to the feature subset, adding a minimum redundancy condition to select m mutually exclusive feature attributes to obtain a feature set with maximum correlation-minimum redundancy:

mRMR＝max(D-R)

wherein, mRMR is the characteristic set of maximum correlation degree-minimum redundancy, R is the result of minimum redundancy condition, z _j J is the jth characteristic attribute, and j is the characteristic attribute number;

a7, obtaining an evaluation result according to the feature set of the maximum correlation degree-the minimum redundancy;

a8, sorting the characteristic attributes of the first characteristic attribute set according to the evaluation result, and selecting the characteristic attribute at the top 20 of the sorting to obtain a characteristic vector;

and A9, judging whether m is the minimum value on the premise that the feature vector testing precision is more than 95%, if so, obtaining a feature extraction selector, otherwise, adjusting the value of m, and returning to the step A5.

The beneficial effects of the above further scheme are: the introduction of the feature extraction selector can extract more feature data, realize the dimension reduction of the feature vector, make the boundary of different types of samples of the new feature vector after the feature selection clearer, and lay a good foundation for data resampling.

Further, the method for constructing the integrated classification model in step S4 includes the following steps:

b1, obtaining a new feature vector of the initial data set according to the feature extraction selector;

b2, extracting the initial data set according to the new feature vector to obtain a first data set;

b3, resampling the fault data in the first data set by adopting a safety level oversampling algorithm safe-level SMOTE to obtain a balanced data set;

b4, training to obtain a plurality of different RVFL classifiers by adjusting parameters of the RVFL network model according to the balanced data set; the expression of the output function of the RVFL network model is as follows:

wherein f (X) is the output function value of the RVFL network model, X is the input vector of the RVFL network model, and omega _j And b _j Weight and deviation of hidden node between function input layer and hidden layer, g is type of activating function, beta _j For output weight, J is the number of hidden nodes, J is the number of hidden nodes, N is the number of input data, x _j Is the jth data in the input vector;

b5, evaluating the plurality of RVFL classifiers, selecting the RVFL classifier with the evaluation value reaching a preset value, and obtaining an integrated classifier by utilizing integrated learning;

and B6, adding an integrated classification model to output a decision according to the integrated classifier to obtain an integrated classification model.

The beneficial effects of the above further scheme are: the problem of data imbalance is solved by adopting a resampling technology, and the data are classified by utilizing a plurality of RVFL classifiers, so that the classification accuracy is improved.

Further, the step B3 includes the steps of:

b301, obtaining a fault data set and a normal data set according to the first data set;

b302, according to the fault data set, obtaining a safety level ratio:

wherein S is _lr For a safe level ratio, S _lp The number of instances in the failure data set for the k nearest neighbors of sample p, p being the sample of the failure data set, S _ln The number of instances in the fault data set for k nearest neighbors of a sample n, n being the nearest neighbor sample of a sample p;

b303, obtaining a new sample of the fault data set according to the safety level ratio:

x _new ＝p+β(n-p)

wherein x is _new Beta is a new sample of the failure data set, and beta is a ratio of different safety levels;

b304, judging whether the number of the new samples of the fault data set is equal to that of the normal data set, if so, updating the fault data set in the first data set into the new sample of the fault data set to obtain a balanced data set, otherwise, updating the fault data set into the new sample of the fault data set, returning to the step B302.

The beneficial effects of the above further scheme are: the data set is balanced, and the defect that smote random area synthesis can be overlapped with normal data samples is overcome.

Further, in the step B5, a RVFL classifier is evaluated by using a metric index F-score and G-means; the expression of the F-score is as follows:

wherein F-score is a fractional value of F-score representing a combined value between accuracy and recall, beta ₁ In order to measure the relative importance of P and R, P is the accuracy of the RVFL classifier, and R is the recall rate of the RVFL classifier;

the expression of G-means is:

the beneficial effects of the above further scheme are: two measurement indexes are adopted for evaluation, the result bias is avoided, meanwhile, the accuracy and recall rate of the RVFL classifier are considered in the application of the measurement indexes, and the adoption of the RVFL classifier with low quality can be effectively avoided.

Further, the expression of the output decision of the integrated classification model in the step B6 is as follows:

Y _o ＝max(Y _c1 ,Y _c2 ,Y _c3 …,Y _cn )

Y _c1 ＝mean(Y ₁₁ ,Y ₁₂ ,Y ₁₃ …,Y _1n )

wherein, Y _o To integrate the output of the classification model, Y _cn Is the probability average of the fault signature cn, Y _c1 Is the probability average, Y, of the fault label c1 _R1 Mean (-) is the mean function for the probability of the failure label c1 from the R-th RVFL classifier, Y is the output of the individual RVFL classifier in the ensemble classifier, Y _i Is the output value, y, corresponding to the ith label _j For the output value corresponding to the jth label, softmax (·) is a normalized exponential function, and exp (·) is an exponential function with a natural constant e as a base.

The beneficial effects of the above further scheme are: the decision can calculate the probability of various possible fault labels, and is convenient for workers to check.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a schematic diagram of a single-phase pulse rectifier according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of average test accuracy results of feature vectors subjected to feature sorting by an original feature vector, an mRMR algorithm, a ReliefF and an mRMR combined algorithm in different feature dimensions in the embodiment of the present invention.

FIG. 4 is a block diagram of an integrated classification model test and diagnostic decision method of the present invention.

FIG. 5 shows an experimental prototype system of a single-phase pulse rectifier according to an embodiment of the present invention at T ₁ The converter outputs network side voltage current and direct current side voltage waveform before and after open circuit fault, and the integrated classification model outputs a schematic diagram of the change result of a fault label.

FIG. 6 shows an experimental prototype system of a single-phase pulse rectifier according to an embodiment of the present inventionT ₃ The converter outputs network side voltage current, direct current side voltage waveform before and after open circuit fault, and the integrated classification model outputs a change result schematic diagram of a fault label.

Fig. 7 is a schematic diagram of a change result of a converter output network side voltage current, a dc side voltage waveform, and an integrated classification model output fault label before and after a current sensor offset fault according to an embodiment of the present invention, which is made under an experimental prototype system of a single-phase pulse rectifier.

Fig. 8 is a schematic diagram of a change result of a converter output network side voltage current, a dc side voltage waveform, and an integrated classification model output fault label before and after a current sensor gain fault according to an embodiment of the present invention, which is made under an experimental prototype system of a single-phase pulse rectifier.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

Example 1

In one embodiment of the present invention, as shown in fig. 1, a power electronic system fault diagnosis method combining resampling and integrated learning includes the following steps:

and S4, obtaining a fault category label by using an integrated classification model according to the feature vector of the per-unit data, and completing fault diagnosis of the power electronic system.

In this embodiment, the basic idea is to use a security-level oversampling algorithm (safe-level SMOTE)) to perform oversampling on a small number of classes, and balance a data set from the perspective of data; the method comprises the steps of training a data set by utilizing the advantage of high computing speed of a random vector functional link network (RVFL), generating an RVFL network model, changing the weight and parameters of the network model based on the idea of ensemble learning, obtaining the diversified RVFL network model, and solving the problem of identifying a few types of fault samples in unbalanced data from the perspective of an algorithm. Aiming at the characteristics that time domain fault characteristics in a power electronic converter system are easily submerged by noise and different fault characteristics are similar in practice, an original data set is processed through data preprocessing, characteristic extraction and characteristic selection to obtain low-dimensional and more relevant characteristics. And (4) according to the carefully selected characteristics, adopting safe-level SMOTE to perform oversampling balance data set on a few classes. And then, a rapid ensemble learning method is designed by adopting an RVFL network model, different measurement indexes F-score and G-means are adopted to evaluate the RVFL classifier, and finally, an ensemble classification model outputs a decision to perform probability calculation so as to accurately identify a fault mode.

The expression of the per unit processing in step S1 is:

x _in ＝x _i /max(x ₁ ,x ₂ ,…x _N )

In this embodiment, the original operation data of the power system is usually a time domain characteristic signal, which has periodicity and time sequence, and one fundamental wave period of the current waveform is taken as a window length to intercept the characteristic data. And (4) taking different operating conditions and load conditions into consideration, acquiring normal operating data and fault data samples based on the power electronic system experiment platform, and performing per unit on the data samples by dividing the data samples by the maximum value in each group of sample data.

The construction of the feature extraction selector in the step S3 includes the following steps:

in this embodiment, in consideration of the characteristic that the original samples of the power electronic system are few, a data set including a large number of normal samples, a small number of power device open circuit fault samples, and a small number of sensor fault samples is manufactured, and 320 groups of data samples are obtained in total, wherein the ratio of the normal samples to the power device fault samples to the sensor fault samples is 2.

in this embodiment, in order to extract more feature data, a new feature vector is formed by extracting spectral features in a data sample and original time domain features using fast fourier transform.

wherein W (A) is the weight of the A-th feature, A is the feature number, R is the sample data, H _j Is the jth nearest neighbor of R, j is the nearest neighbor number, k is the total number of selected nearest neighbors, diff (A, R, H) _j ) As sample R and sample H _j The difference in characteristic A, p (-) is the prior probability of Class, M is the sample sampling time, class (R) is the Class label to which R belongs, M _j (C) To represent the jth nearest neighbor sample in the class, C is the number of all labelsNumber, diff (A, R) ₁ ,R ₂ ) Is a sample R ₁ And sample R ₂ Difference in characteristic A, R ₁ And R ₂ All the sample data refer to symbols, and min (-) is a minimum function;

in this embodiment, reliefF is a feature weighting algorithm, and assigns different weights to features according to the correlation between the feature attributes and the fault categories, where the greater the feature weight is, the stronger the classification capability of the features is, and otherwise, the weaker the classification capability is. Thus, if the feature weight is less than the weight threshold, set to zero in this work, the feature is culled.

a5, according to the first characteristic attribute set, selecting m characteristic attributes with the largest average mutual information to obtain a characteristic subset of the fault category:

d (S, c) is average mutual information of the feature subset S and the fault category c, S is the feature subset, c is the fault category, I (-) is the mutual information technology measurement result, z _i Is the ith feature attribute, i is the feature attribute number, p (z) _i ) Is the edge probability density function of the ith feature attribute, p (c) is the edge probability density function of the fault class c, p (z) _i C) is z _i And c, a joint probability density function;

mRMR＝max(D-R)

In this embodiment, in order to implement the dimension reduction of the feature vector, a feature weighting algorithm ReliefF and a maximum-redundancy-min-redundancy (rmr) algorithm are combined to remove redundant and uncorrelated components in the feature signal. Firstly, a Relieff algorithm is adopted to evaluate the correlation between the characteristic attributes and the fault categories, then the characteristic attributes with the weight less than zero are removed, and the characteristic attributes with the weight more than zero are reserved. And then, further evaluating the correlation between the reserved characteristic attributes and the fault categories and the redundancy between the characteristics by adopting an mRMR algorithm, and reordering the characteristic attributes according to the evaluation result. Finally, the top-ranked 20 feature attributes are selected as the new feature vector. Through a combination of the two feature selection algorithms, each attribute in the fault feature vector can be sorted by weight, relevance, and redundancy to the classification label. And finally, selecting the feature attributes with higher quality to construct a new feature vector with lower dimension, so that the boundaries of different types of samples of the new feature vector after feature selection are clearer, and a good foundation is laid for data resampling.

In this embodiment, in order to verify the validity of the feature selection result, the feature vectors sorted according to the mRMR algorithm, the ReliefF algorithm, the mRMR and ReliefF combination algorithm are compared with each other, common classification algorithms are trained and tested by gradually introducing feature vectors of different dimensions, then the average test precision can be calculated, the feature attribute quality is judged, and finally the feature vector with the smallest dimension is selected on the premise of ensuring higher feature vector quality (the test precision is greater than 95%), so that the most critical feature attribute selection is completed.

The construction method of the integrated classification model in the step S4 comprises the following steps:

b3, resampling the fault data in the first data set by adopting a safety level oversampling algorithm safe-level SMOTE to obtain a balanced data set, and including the following steps:

b302, according to the fault data set, obtaining a safety level ratio:

wherein S is _lr For a safe level ratio, S _lp The number of instances in the failure data set for the k nearest neighbors of sample p, p being the sample of the failure data set, S _ln The number of instances in the fault data set for k nearest neighbors of sample n, where n is the nearest neighbor sample of sample p;

x _new ＝p+β(n-p)

b304, judging whether the number of the new samples of the fault data set is equal to that of the normal data set, if so, updating the fault data set in the first data set into the new samples of the fault data set to obtain a balanced data set, otherwise, updating the fault data set into the new samples of the fault data set, and returning to the step B302.

In this embodiment, based on the new feature vector after feature extraction, a safe-level SMOTE algorithm is used to sample a small number of samples of power device faults and sensor faults, so as to obtain a new sample and balance a data set. Resampling techniques are expected to overcome the challenges of unbalanced data problems, including over-sampling or under-sampling methods. Generally, the oversampling method is more suitable for a power electronic system with a smaller data set, because the undersampling may lose some important information, thereby affecting the test accuracy. The safe-level SMOTE algorithm assigns a security level for each instance before generating the composite sample, a new composite instance is created only in the security region and closer to the maximum security level, the composite sample is randomly generated by selecting the nearest few neighbors of a few samples. The safety level is calculated as follows:

firstly, defining a data set D as a set of all a few types of sample data to be oversampled, wherein p is a certain sample in D, calculating k nearest neighbor samples of the sample p, wherein slp is equal to the number of instances in the data set D, and randomly taking one nearest neighbor sample to be recorded as n; k nearest neighbor samples of sample n are computed, slp being equal to the number of instances in its data set D.

Then calculate S _lp And S _ln Quotient between, defined as the safety level ratio S _lr Expressed as follows:

new sample x _new Is according to S _lr Is generated between samples p and n, as follows:

x _new ＝p+β(n-p)

wherein, beta is taken according to different safety level ratios. When S is _lr Is equal to ∞ and S _lp Equal to 0, no samples are generated; when S is _lr Is equal to ∞ and S _lp When not equal to 0, β is equal to 0, corresponding to the copy sample p; when S is _lr When equal to 1, beta is [ 01 ]]A random number within a range; when S is _lr When greater than 1, beta is [ 0/S _lr ]A random number within a range; when S is _lr When less than 1, beta is [1-S ] _lr 1]Random numbers within a range.

And circulating according to the above rules until the oversampling number to be reached is met, finally enabling the number of the minority class samples to be consistent with the number of the normal data samples, and obtaining a data set with balanced normal samples and fault data samples, wherein the total number of the data set is 480 groups of samples.

wherein f (X) is the output function value of the RVFL network model, X is the input vector of the RVFL network model, and omega _j And b _j Respectively the weight and deviation of hidden nodes between the function input layer and the hidden layer, g is the type of the activating function, beta _j For output weight, J is the number of hidden nodes, J is the number of hidden nodes, N is the number of input data, x _j Is the jth data in the input vector;

b5, evaluating the plurality of RVFL classifiers, selecting a classifier with an evaluation value reaching a preset value, and obtaining an integrated classifier by utilizing integrated learning;

In the embodiment, based on a balanced data set, an RVFL classifier is obtained by training an RVFL network model, different RVFL classifiers are obtained by adjusting parameters of the RVFL network model, the RVFL classifier is evaluated by adopting different measurement indexes F-score and G-means, and then a plurality of excellent RVFL classifiers are constructed and combined by utilizing an integrated learning idea to obtain an integrated classifier so as to complete a learning task.

Defining input data X＝[x ₁ ,x ₂ ,…,x _N ]Output data Y = [ Y ] ₁ ,y ₂ ,…,y _N ]The output function of the RVFL network model can be described as:

the RVFL network model randomly generates the weight and the weight bias of the hidden layer neuron, and the output weight is solved through Moore-Penrose pseudo-inverse calculation through matrix operation to realize training. On the premise of meeting the training precision, different RVFL classifiers are trained by adjusting the number of hidden layer nodes and the type of an activation function of the RVFL network model, and the RVFL classifiers are evaluated by adopting the measurement indexes F-score and G-means, and finally the integrated classifier comprising a plurality of RVFL classifiers is obtained.

In this embodiment, the final decision output of the integrated classifier is usually determined by most voting methods, but the method cannot calculate the probability of various possible fault labels. The output of a single classifier is converted by introducing a 'softmax' function, so that a probability matrix with the range of [0 ] is obtained, and the function is expressed as follows:

and then calculating the output of all the RVFL classifiers, calculating the probability average value corresponding to the same fault label, and outputting the fault label corresponding to the maximum probability value as the final result to output Ro.

In the step B5, a RVFL classifier is evaluated by adopting a metric index F-score and G-means; the expression of the F-score is as follows:

wherein F-score is a F-score value representing a combined value between accuracy and recall, β ₁ For the parameters measuring the relative importance of P and R, P is RVFThe accuracy of the L classifier, and R is the recall ratio of the RVFL classifier;

the expression of G-means is:

the expression of the integrated classification model output decision in the step B6 is as follows:

Y _o ＝max(Y _c1 ,Y _c2 ,Y _c3 …,Y _cn )

Y _c1 ＝mean(Y ₁₁ ,Y ₁₂ ,Y ₁₃ …,Y _1n )

wherein Y is _o To integrate the output of the classification model, Y _cn Is the probability average of the fault signature cn, Y _c1 Is the probability average, Y, of the fault label c1 _R1 Mean (-) is the mean function for the probability of the failure label c1 from the R-th RVFL classifier, Y is the output of the individual RVFL classifier in the ensemble classifier, Y _i Is the output value, y, corresponding to the ith label _j For the output value corresponding to the jth label, softmax (·) is a normalized exponential function, and exp (·) is an exponential function with a natural constant e as a base.

In this embodiment, TP and TN are defined to respectively represent the number of positive samples and failure samples accurately output by the integrated classifier on the test sample set, FP and FN represent the number of positive samples and failure samples erroneously output by the RVFL classifier on the test sample set, accuracy and recall are two basic indexes of the classification model, which are respectively expressed as P = TP/(TP + FP) and R = TP/(TP + FN), and F-score is a comprehensive value between accuracy and recall, as follows:

wherein beta is ₁ Is a parameter for measuring the relative importance of P and R. Beta in the invention ₁ Equal to 1, which means that P and R are of equal importance.

G-means is the geometric mean of P and R, expressed as follows:

the unbalanced data classification performance is generally evaluated by using two indexes of F-score and G-means, and the higher the values of F-score and G-means, the better the performance of the RVFL classifier is.

Example 2

The invention provides a power electronic system fault diagnosis method combining resampling and integrated learning, which uses a single-phase pulse rectifier power device T ₁ Open circuit failure, T ₃ Open circuit faults, current sensor offset faults and gain faults are taken as examples, different imbalance ratios of normal samples and fault samples, voltage change and load change of a grid side are considered, and online diagnosis and positioning of different fault modes are considered. The basic circuit topology of the single-phase pulse rectifier is shown in fig. 2.

The invention provides a power electronic system fault diagnosis method combining resampling and integrated learning, which has the following input quantity: the grid side current is; the output quantity of the online fault diagnosis algorithm of the single-phase pulse rectifier is as follows: and outputting a fault label by the integrated classification model.

Firstly, collecting the original operation data of the monophase pulse rectifier, wherein the original operation data has a network side voltage u _s Net side current i _s And a DC side voltage u _dc The period of the current fundamental wave is 50Hz, and the sampling frequency is 20KHz. Normal operation of the cut-off single-phase pulse rectifier, T ₁ Open circuit failure, T ₃ Taking 200D current data in one-half fundamental wave period under open circuit fault, current sensor offset fault and gain fault as raw data samples, and considering network side voltage u _s In the range of 50V to 70V and the range of 20 omega to 40 omega of load resistance, obtain richer normal operation data and fault data samples based on the experimental platform of the power electronic system. Tong (Chinese character of 'tong')And dividing the data samples by the maximum value in each group of sample data to perform per unit and labeling the fault class label. Defining normal operation, power device T ₁ Open circuit failure, T ₃ The fault category labels of the open circuit fault, the current sensor offset fault and the gain fault are respectively 0,1,2,3 and 4, a data set comprising a large number of normal samples, a small number of power device open circuit fault samples and a small number of sensor fault samples is manufactured, and a data sample group 320 is obtained in total, wherein the normal samples, the T samples ₁ And T ₃ The power device open circuit fault sample and current sensor offset and gain fault sample ratio is 4.

Further, the frequency spectrum characteristics in the data samples are extracted by using fast Fourier transform, more frequency domain characteristic data are extracted, the frequency domain component amplitude after fast Fourier transform is calculated to serve as a new characteristic vector, 100-dimensional frequency domain characteristic data are obtained, the frequency domain characteristic data and the original 200-dimensional current data samples are combined to serve as the characteristic vector, and the dimension is 300.

Further, in order to realize the dimension reduction of the feature vector, redundant and irrelevant components in the feature signal are removed, and a Relieff algorithm and an mRMR algorithm are combined to perform feature sorting and verification. Firstly, a relationship between the feature attributes and the fault categories is evaluated by calculating the weight through a Relieff algorithm, then 200-dimensional feature attributes with the weight smaller than zero are removed, and 100-dimensional feature attributes with larger feature weights are reserved. And then, further evaluating the correlation between the characteristic attributes and the fault classes and the redundancy between the characteristics by adopting a maximum correlation minimum redundancy (mRMR) algorithm, and reordering the characteristic attributes.

Further, in order to verify the effectiveness of the feature selection result, feature vectors sorted according to an mRMR algorithm, a ReliefF algorithm, an mRMR and ReliefF combination algorithm are compared, common classification algorithms are trained and tested by gradually introducing feature vectors of different dimensions, and then the average test precision can be calculated, as shown in fig. 3, it can be seen that when the feature dimension is greater than 40, the test precision obtained by all feature vectors is greater than 95.1%. That is, the effectiveness of features with high dimensionality is acceptable without regard to any feature selection. However, with the selection of the most critical features for the mRMR and ReliefF combined algorithm, the RVFL classifier also achieves 96.6% accuracy with 20-dimensional features. Finally, the top-ranked 20 feature attributes are selected as the new feature vector.

Further, based on the new feature vector after feature extraction and selection, by adjusting the number of minority class samples for power device faults and sensor faults, different imbalance ratios 4. Sampling is carried out on the basis of the data sets D1, D2 and D3 by adopting a safe-level SMOTE algorithm to obtain new samples, the number of the minority samples is consistent with that of the normal data samples, a data set with the balance of the normal samples and the fault data samples is obtained, and a total of 480 groups of samples construct balance data sets D11, D22 and D33. Different classification models are obtained by training the unbalanced data sets D1, D2 and D3 and the balanced data sets D11, D22 and D33 by using a common classification algorithm, and the average test accuracy in the same test set is 0.9511,0.8455,0.7909,0.9841,0.9034 and 0.8659 respectively. It can be seen that the resampled data set achieves better classification performance.

And further, based on the resampled balanced data set, training by using an RVFL network to obtain a classifier, and evaluating the classifier by adopting different measurement indexes to test the precision and F-score and G-means. By adjusting the parameters of the RVFL network model, based on the data set D11 training and testing, a single RVFL classifier can achieve a test accuracy of 0.9667, with F-score and G-means of 0.9619 and 0.9768, respectively; based on data set D22, a single RVFL classifier can achieve a test accuracy of 0.8875, with F-score and G-means of 0.8503 and 0.8988, respectively; based on data set D33, a single RVFL classifier can achieve a test accuracy of 0.8381, with F-score and G-means of 0.7214 and 0.8096, respectively.

Furthermore, n different RVFL classifiers are obtained by changing the activation function type and the number of hidden layer nodes of the RVFL network model, wherein n is 5, and then a plurality of excellent RVFL classifiers are constructed and combined by utilizing the idea of ensemble learning to obtain a well-trained ensemble classification model and finish the learning task.

Further, as shown in fig. 4, all RVFL classifiers output calculation, calculate the probability average corresponding to the same fault label, and output the fault label corresponding to the maximum probability value as the final result. Based on the data set D11 training and testing, the integrated RVFL classifier can achieve a test accuracy of 0.9810, with F-score and G-means of 0.9751 and 0.9838, respectively; based on dataset D22, a single RVFL classifier can achieve a test accuracy of 0.9143, with F-score and G-means of 0.8988 and 0.9103, respectively; based on data set D33, a single RVFL classifier can achieve a test accuracy of 0.8476, with F-score and G-means of 0.7522 and 0.8223, respectively.

And further, sampling current data in a half fundamental wave period in real time, performing data per unit by sequentially referring to an offline training process, performing frequency domain feature extraction and Relieff and mRMR algorithm feature selection by using a sampling fast Fourier transform algorithm, constructing a feature vector, inputting the feature vector into a trained integrated classification model for diagnosis and decision, and outputting a fault class label.

The diagnosis algorithm is carried out on-line test based on the RT-box controller and the physical hardware test platform, and the test results of the single-phase pulse rectifier under the open-circuit fault of different power devices and the gain and offset fault of the current sensor are shown in figures 5-8 (figure 5 shows the network side voltage u _s 70V, load resistance R _L At 40 Ω, T ₁ The diagnosis test result of open circuit fault, FIG. 6 is the grid side voltage u _s 70V, load resistance R _L At 40 Ω, T ₃ The diagnosis test result of open circuit fault, FIG. 7 is the grid side voltage u _s Is 80V, and has a load resistance R _L 20 omega, current sensor offset fault diagnostic test results, fig. 8 is the grid side voltage u _s Is 80V, load resistance R _L 20 Ω, diagnostic test result of current sensor gain fault).

Claims

1. A power electronic system fault diagnosis method combining resampling and integrated learning is characterized by comprising the following steps:

s3, obtaining a feature vector of the per-unit data through a feature extraction selector according to the frequency domain features of the per-unit data;

2. The method for diagnosing the fault of the power electronic system combining resampling and ensemble learning according to claim 1, wherein the expression of the per-unit processing in the step S1 is as follows:

x _in ＝x _i /max(x ₁ ,x ₂ ,…x _N )

3. The power electronic system fault diagnosis method combining resampling and ensemble learning according to claim 1, wherein the construction of the feature extraction selector in the step S3 comprises the following steps:

wherein W (A) is the weight of the A-th feature, A is the feature number, R is the sample data, H _j Is the jth nearest neighbor of R, j is the nearest neighbor number, k is the total number of selected nearest neighbors, diff (A, R, H) _j ) As sample R and sample H _j The difference in characteristic A, p (-) is the prior probability of Class, M is the sample sampling number, class (R) is the Class label to which R belongs, M _j (C) To represent the jth nearest neighbor sample in the class, C is the number of all labels, diff (A, R) ₁ ,R ₂ ) Is a sample R ₁ And sample R ₂ Difference in characteristic A, R ₁ And R ₂ All the sample data refer to symbols, and min (-) is a minimum function;

wherein D (S, c) is the average mutual information of the feature subset S and the fault category c, S is the feature subset, c is the fault category, I (-) is the mutual information technology measurement result, z _i Is the ith characteristic attribute, i isFeature Attribute number, p (z) _i ) An edge probability density function for the ith feature attribute, p (c) an edge probability density function for the fault class c, and p (zi, c) z _i And c, a joint probability density function;

mRMR＝max(D-R)

and A9, judging whether m is the minimum value on the premise that the feature vector testing precision is larger than 95%, if so, obtaining a feature extraction selector, otherwise, adjusting the value of m, and returning to the step A5.

4. The power electronic system fault diagnosis method combining resampling and ensemble learning according to claim 3, wherein the method for constructing the ensemble classification model in the step S4 comprises the following steps:

5. A power electronic system fault diagnosis method combining resampling and integrated learning according to claim 4, wherein the step B3 comprises the following steps:

b302, according to the fault data set, obtaining a safety level ratio:

wherein S is _lr For a safe level ratio, S _lp The number of instances in the failure data set for the k nearest neighbors of sample p, p being the sample of the failure data set, S _ln In-failure dataset for k nearest neighbors to sample nN is the nearest neighbor sample of sample p;

x _new ＝p+β(n-p)

wherein x is _new β is a new sample of the failure data set, β is a different safety level ratio;

6. A power electronic system fault diagnosis method combining resampling and ensemble learning according to claim 4, wherein the RVFL classifier is evaluated in step B5, which is specifically:

evaluating the RVFL classifier by using the metric index F-score and G-means; the expression of the F-score is:

wherein F-score is F-score value representing the integrated value between accuracy and recall rate, P is the accuracy of RVFL classifier, R is the recall rate of RVFL classifier, beta ₁ Is a parameter for measuring the relative importance of P and R;

the expression of G-means is:

7. a power electronic system fault diagnosis method combining resampling and ensemble learning according to claim 4, wherein the expression of the ensemble classification model output decision in step B6 is:

Y _o ＝max(Y _c1 ,Y _c2 ,Y _c3 …,Y _cn )

Y _c1 ＝mean(Y ₁₁ ,Y ₂₁ ,Y ₃₁ …,Y _R1 )

wherein, Y _o To integrate the output of the classification model, Y _cn Is the probability average, Y, of the fault label cn _c1 Is the probability average, Y, of the fault label c1 _R1 Mean (-) is the mean function for the probability of the failure label c1 from the R-th RVFL classifier, Y is the output of the individual RVFL classifier in the ensemble classifier, Y _i For the output value, y, corresponding to the ith label _j For the output value corresponding to the jth label, softmax (·) is a normalized exponential function, and exp (·) is an exponential function with a natural constant e as a base.