CN114781495A - Intelligent ammeter fault classification method based on sample global rebalancing - Google Patents
Intelligent ammeter fault classification method based on sample global rebalancing Download PDFInfo
- Publication number
- CN114781495A CN114781495A CN202210348671.1A CN202210348671A CN114781495A CN 114781495 A CN114781495 A CN 114781495A CN 202210348671 A CN202210348671 A CN 202210348671A CN 114781495 A CN114781495 A CN 114781495A
- Authority
- CN
- China
- Prior art keywords
- sample
- samples
- codes
- data set
- fault
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 70
- 238000005516 engineering process Methods 0.000 claims abstract description 23
- 230000004927 fusion Effects 0.000 claims abstract description 9
- 239000013589 supplement Substances 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 23
- 238000012549 training Methods 0.000 claims description 23
- 238000009826 distribution Methods 0.000 claims description 18
- 238000005457 optimization Methods 0.000 claims description 18
- 238000005070 sampling Methods 0.000 claims description 11
- 238000007637 random forest analysis Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000004891 communication Methods 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 4
- 238000013256 Gubra-Amylin NASH model Methods 0.000 claims description 3
- 230000008485 antagonism Effects 0.000 claims description 3
- 125000004122 cyclic group Chemical group 0.000 claims description 3
- 238000009434 installation Methods 0.000 claims description 3
- 238000011084 recovery Methods 0.000 claims description 3
- 238000003860 storage Methods 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 230000009849 deactivation Effects 0.000 claims description 2
- 230000000452 restraining effect Effects 0.000 claims 1
- 230000009467 reduction Effects 0.000 abstract description 4
- 238000004422 calculation algorithm Methods 0.000 description 12
- 238000013135 deep learning Methods 0.000 description 6
- 238000012423 maintenance Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01R—MEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
- G01R35/00—Testing or calibrating of apparatus covered by the other groups of this subclass
- G01R35/04—Testing or calibrating of apparatus covered by the other groups of this subclass of instruments for measuring time integral of power or current
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The embodiment of the invention provides a sample global rebalancing-based intelligent electric meter fault classification method, which comprises the following steps: dividing fault history data under different categories of the intelligent electric meters to obtain a plurality of second-class data sets; constructing a fusion model of VAE and GAN aiming at each class II data set, respectively taking each sample as the input of the model, and dividing the hidden codes of the samples into important feature codes and secondary feature codes; obtaining variant implicit codes of the samples by an implicit code reconstruction technology, and generating a plurality of reliable similar variant samples considering the important characteristics of input samples through the reduction of a decoder and the countermeasures of mutual information constraint and a discriminator; designing a characteristic repulsion technology acting between two types of sample implicit codes to carry out supervised characteristic representation learning; and superposing the reconstruction errors of all dimensions of the sample by a mixed coding technology to supplement the important feature codes, judging the classification result of the sample to be detected under each two-class data set according to the judgment result, and obtaining the fault category of the sample by hard voting.
Description
[ technical field ] A
The invention relates to a fault classification method for an intelligent electric meter, in particular to a fault classification method for an intelligent electric meter based on sample global rebalancing.
[ background ] A method for producing a semiconductor device
Since the 21 st century, the intelligent electric meter is popularized and applied in the field of power utilization information collection on a large scale, the electric power industry enters a big data era, the intelligent electric meter is used as a terminal device under the current intelligent power grid construction in China, and the intelligent electric meter integrates multiple functions of metering, displaying, communication and the like, and plays an important supporting role in stable operation of a power grid. Along with the development of the smart power grid, the functions of the smart power meter are increasingly enriched, the fault types of the smart power meter are also increasingly increased, and the timely maintenance of the fault power meter has great significance for the stable operation of the power grid and the stable power utilization of users. However, because the operation and maintenance personnel often only have the maintenance experience for some specific faulty meters, when the smart meter fails, it is difficult to dispatch the maintenance personnel with the relevant maintenance experience to repair the meter without knowing the fault type of the meter. Therefore, accurate prediction of the fault type of the smart meter is crucial.
With the development of a smart power grid, the demand on a smart electric meter is increased day by day, smart electric meter manufacturers increase, design schemes, component types and process flows of different manufacturers are different, and a complex production environment and an application environment are added, so that the fault types of the smart electric meter are complex and various, the fault occurrence is influenced by the factors, and the finding of the mapping relation between the factors and the fault types is extremely complex, which is a typical machine learning problem. In addition, the occurrence frequency of different fault types of electric meters is different, fault data show multi-mode distribution characteristics, and therefore the prediction difficulty of the intelligent electric meter is increased. Learning directly based on unbalanced data can result in a shift of the classifier's prediction results to a majority class. The balance among classes can be realized by a data sampling means, but the method lacks a guarantee mechanism for the authenticity of a synthesized sample; the generated countermeasure network can improve the authenticity of the generated samples through the countermeasure game of the generator and the discriminator, but the problem of mode collapse exists, and the consistency of the distribution of a few types of samples before and after balance cannot be ensured. Based on the analysis, the invention provides a smart meter fault classification method based on sample global rebalancing so as to improve the performance of smart meter fault classification.
[ summary of the invention ]
In view of this, the invention provides a method for classifying faults of an intelligent electric meter based on sample global rebalancing, so as to improve the performance of fault classification of the intelligent electric meter.
The invention provides a sample global rebalancing-based intelligent electric meter fault classification method, which comprises the following steps of:
(1) taking fault historical data of the intelligent electric meters in different categories as an input data set, and dividing to obtain a plurality of secondary data sets, specifically:
inputting an actual fault data set of the intelligent electric meter, wherein samples in the data set comprise 9 characteristic variables of working duration, arrival batch number, power supply unit number, electric energy meter category, fault identification month, installation month, province, equipment specification, communication mode and equipment identification; the fault category labels of the system comprise 11 types of appearance faults, metering faults, storage unit faults, processing unit faults, display unit faults, control unit faults, power supply unit faults, communication unit faults, clock unit faults, other faults and software faults; traversing each type of sample in the fault data set, taking all samples in the type as a few type of sample sets, taking all samples in other types as a majority type of sample set, and converting the original data set into 11 second type data sets; for each of the two types of data sets, the data set can be described as:
X=[Xmin,Xmaj]
wherein X is a class II data set, and X is defined as any sample in the data set, namely X belongs to X; xminFor a minority class sample set, define xminFor any sample in the dataset, i.e. xmin∈Xmin;XmajFor most classes of sample sets, define xmajFor any sample in the data set, i.e. xmaj∈Xmaj;
(2) Constructing a fusion model of VAE and GAN aiming at each two-class data set, respectively taking each sample as the input of the model, and dividing the hidden codes into important feature codes and secondary feature codes, wherein the method specifically comprises the following steps:
based on the unbalanced binary data set X ═ X obtained in the step (1)min,Xmaj]Building a VAE/GAN model and training:
the VAE learns the implicit posterior distribution q (z | x) of the sample x by the encoder and the distribution p (x | z) of the sample x by the decoder as follows:
z~Enc(x)=q(z|x)
where Enc is the encoder, Dec is the decoder, z is the implicit code obtained by VAE encoding samples x,reconstructed samples generated for VAE;
and fusing a decoder of the VAE with a generator of the GAN to obtain a fusion model of the VAE and the GAN, wherein the optimization goal is as follows:
wherein alpha is1,α2Is a root of Chao ShenNumber, LpriorThe prior loss due to the difference in the distribution of q (z | x) and N (0, I), where N (0, I) is a normal distribution with a mean of 0 and a variance of 1, and DKL() Is KL divergence; l is a radical of an alcohollikelihoodObtaining a likelihood loss, L, for p (x | z) for sampling q (z | x)GANIs the antagonism loss of GAN; z is a radical ofminIs xminIs implicitly coded, zmajIs xmajHidden coding of (2); e2]For the expected computation function, Dis is the discriminator;
based on the optimization target, m-dimensional hidden codes z containing the characteristics of a given sample x can be obtained; the method is characterized in that the hidden coding of a VAE model to a sample x is divided into important feature codes z in the training processKFAnd secondary feature coding zSFTwo parts; the important feature codes correspond to some important attributes of the samples, are high-level feature representations of the samples and are key features for determining sample categories, and the secondary feature codes are personalized attributes of the samples, generally have no generality, and the change of the secondary feature codes does not cause the change of the sample categories; namely: front m of division z1Dimension zKFAfter m2Dimension is zSFAnd m is equal to m1+m2Sample xminIs steganographically coded zminIs shown as follows:
zmin=[zmin,KF,zmin,SF]
=Enc(xmin)
wherein z ismin,KFIs xminIs coded for the important feature, zmin,SFIs xminEncoding the secondary features of (1);
(3) obtaining variant implicit codes of the samples by an implicit code reconstruction technology, and generating a plurality of reliable similar variant samples considering the important characteristics of input samples by the reduction of a decoder, mutual information constraint and the confrontation of a discriminator, wherein the method specifically comprises the following steps:
based on the sample implicit codes obtained in the step (2) and the important feature codes and the secondary feature codes thereof, keeping the important feature codes unchanged, and randomly replacing the secondary feature codes to generate similar samples with the same characteristics as the given sample;
1) the decoder recovery process of the variant implicit code and the variant samples is calculated as follows:
wherein,in order to obtain a variation-latent code,is composed ofObtaining a variant sample;sampling from its a priori normal distribution N (0, I);
in order to make the important feature codes correspond to some important attributes of the samples, a mutual information maximization constraint of a few types of samples and the important feature codes thereof needs to be introduced on the basis of the optimization target in the step (2), namely, the important feature codes of the samples can be reversely deduced through a mutual information reasoning model Q, and an optimization target L of the mutual informationInforenceThe following:
wherein beta is1Is a hyperparameter, z ″)min,KFIs a variant sampleThe model Q is replaced by an encoder Enc; z'min,KFFor reconstructing samplesCoding of important features by reverse reasoning, i.e.z″min,KFIs a variant sampleCoding of important features by reverse reasoning, i.e.
2) Modifying the optimization target L of GAN in step (2) in order to generate reliable variant samplesGANThe method comprises the following two parts:
wherein L isGenTo optimize the goal of the generator, LDisIs an optimization target of the discriminator; gamma ray1、γ2、γ3、γ4Is a hyper-parameter;
3) for each minority class sample x in the datasetminObtaining variant latent codes by applying the process; after the traversal is finished, combining the obtained variant sample with the original minority sample to obtain a new minority sample setSampling until the number of the samples is consistent with that of most samples to obtain a balanced data set
(4) A characteristic repulsive force technology acting between two types of sample implicit codes is designed to carry out supervised characteristic representation learning, and the method specifically comprises the following steps:
based onThe balance data set obtained in the step (3)Z for arbitrarily taking a batch of samples in a training cyclemaj,KFAnd Zmin,KFWherein Z ismaj,KFA feature set formed by latent coding of important features of all the samples of the plurality of classes in the training batch, Zmin,KFA feature set formed by the hidden codes of the important features of all the few classes of samples in the training batch; characteristic repulsive force Lfeature_forceIs calculated as follows:
dloss=Nearest_Neighbor_distance(Zmin,KF,Zmaj,KF,n)
wherein, the Nearest _ Neighbor _ distance () is a distance loss calculation function, and the function thereof is as follows: looking for zmin,KFAt Zmaj,KFN nearest neighbors in (1), wherein n is 20, zmin,KF∈Zmin,KFAnd return all Zmin,KFThe obtained average distance set dloss;Lfeature_classDefining important feature code first dimension to represent class information of samples, wherein rho is a super parameter and is a feature repulsion label of a few types of samples, and-rho is a feature repulsion label of a plurality of types of samples; mu.s1And mu2Is a hyper-parameter;
(5) and superposing the reconstruction error of each dimension of the sample by a mixed coding technology to supplement the important feature codes, judging the classification result of the sample to be detected under each two-class data set according to the judgment result, and obtaining the fault category of the sample by hard voting, wherein the method specifically comprises the following steps of:
based on the cyclic training model of the steps (1) to (4), a reliable balanced data set can be obtainedCombining important feature codes of the samples and reconstruction errors e of each dimension to serve as new sample features for distinguishing sample categories; for an input sample x, its structural features are denoted as F ═ zKF,e]E is calculated as follows:
inputting the F and the corresponding class label into a random forest classifier for training to obtain a single unbalanced two classifier; repeating the above process can obtain 11 random forest classifiers RFjJ is subscript of random forest classifier, j belongs to [1,11 ]](ii) a For the sample x to be measuredtestWhich predicts the labelIs calculated as follows:
In the above method, in the step (2), the structures of the encoder Enc, the decoder Dec and the discriminator Dis are as follows:
where Reshape () is a tensor warping function; conv1D () is a one-dimensional convolutional layer building function; flatten () is a tensor flattening function; dense () is the full connection layer build function; LeakyRelu and tanh represent the corresponding activation functions; dropout () is a random deactivation function.
In the above method, the step (2)In alpha1And alpha2Are respectively 0.1 and 1; m is1Is 5, m2Is 1.
In the above process, in the step (3), β1Is 1; gamma ray1、γ2All values of (a) are 0.5, gamma3、γ4The values of (A) are all 1.
In the method, in the step (4), ρ is 0.4; mu.s1And mu2The values of (a) are 1 and 2, respectively.
According to the intelligent electric meter fault classification method based on sample global rebalancing, the accuracy and the recall rate of intelligent electric meter fault classification are improved.
According to the technical scheme, the invention has the following beneficial effects:
in the technical scheme implemented by the invention, the sample generation aiming at the specified characteristics is realized through redefining the VAE hidden code, the problems of data authenticity and overfitting caused by a data oversampling algorithm are solved, and the problem of mode running existing in the conventional sample generation algorithm is improved; the two modes of designing a characteristic repulsion technology and a mixed coding technology relieve the problem of difficult characteristic extraction and classification of samples in an overlapping area. The method can improve the robustness of the model, so that the accuracy and the recall rate of fault classification of the intelligent electric meter are improved.
[ description of the drawings ]
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor.
FIG. 1 is a schematic flow chart of a framework of a fault classification method for an intelligent electric meter based on sample global rebalancing, which is provided by the invention;
FIG. 2 is a schematic flow chart of a fault classification of the smart meter;
fig. 3 is a detailed schematic of the algorithm of the present invention.
[ detailed description ] A
In order to better understand the technical scheme of the invention, the invention is described in detail below with reference to the accompanying drawings.
It should be understood that the described embodiments of the invention are only some of the embodiments of the invention, and not all of them. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a method for classifying faults of an intelligent ammeter based on sample global rebalancing. In order to meet the fault classification of the intelligent electric meter, the invention disassembles the complex unbalanced multi-classification problem into a plurality of unbalanced two-classification problems, designs a hidden coding reconstruction technology to realize the generation of reliable variation samples considering the key characteristics of given samples, and obtains a plurality of balanced two-class data sets; the two modes of a characteristic repulsion technology and a mixed coding technology are designed to relieve the problem of difficult characteristic extraction and classification of samples in an overlapping area; and finally, integrating the classification results of the samples to be detected under each two-class data set, and obtaining the fault classes of the samples to be detected in a hard voting mode.
Fig. 1 is a schematic flow chart of a framework of a method for classifying faults of an intelligent electric meter based on sample global rebalancing, which includes the following steps:
inputting an actual fault data set of the intelligent electric meter, wherein samples in the data set comprise 9 characteristic variables of working duration, a number of a lot to be delivered, a power supply unit number, a type of the electric energy meter, a fault identification month, an installation month, a province, an equipment specification, a communication mode and an equipment identifier; the fault category labels of the fault category labels comprise 11 types of appearance faults, metering faults, storage unit faults, processing unit faults, display unit faults, control unit faults, power supply unit faults, communication unit faults, clock unit faults, other faults and software faults; traversing each type of sample in the fault data set, taking all samples in the type as a few type sample set, taking all samples in the other types as a majority type sample set, and converting the original data set into 11 second type data sets; for each of the two types of data sets, the data set can be described as:
X=[Xmin,Xmaj]
wherein X is a class II data set, and X is defined as any sample in the data set, namely X belongs to X; xminFor a minority class sample set, define xminFor any sample in the dataset, i.e. xmin∈Xmin;XmajFor most classes of sample sets, define xmajFor any sample in the dataset, i.e. xmaj∈Xmaj。
based on the unbalanced binary data set X ═ X obtained in step 101min,Xmaj]Building a VAE/GAN model and training:
the VAE learns the implicit variable posterior distribution q (z | x) of the sample x by the encoder and the distribution p (x | z) of the sample x by the decoder as follows:
z~Enc(x)=q(z|x)
where Enc is the encoder, Dec is the decoder, z is the implicit code obtained by subjecting sample x to a VAE encoder,reconstructed samples generated for VAEs;
and fusing a decoder of the VAE with a generator of the GAN to obtain a fusion model of the VAE and the GAN, wherein the optimization goal is as follows:
wherein alpha is1,α2Is hyperparametric, LpriorThe prior loss due to the difference in the distribution of q (z | x) and N (0, I), where N (0, I) is a normal distribution with a mean of 0 and a variance of 1, and DKL() Is KL divergence; l is a radical of an alcohollikelihood sample q (z | x) to yield a likelihood loss of p (x | z), LGANIs the antagonism loss of GAN; z is a radical ofminIs xminLatent coding of (b), zmajIs xmajThe hidden coding of (2); e [ 2 ]]Dis is an identifier for the expected calculation function;
based on the optimization target, m-dimensional hidden codes z containing the characteristics of a given sample x can be obtained; the method is characterized in that the hidden coding of a VAE model to a sample x is divided into important feature codes z in the training processKFAnd secondary feature coding zSFTwo parts; the important feature codes correspond to some important attributes of the samples, are high-level feature representations of the important features and are key features for determining sample categories, and the secondary feature codes are personalized attributes of the samples, generally have no generality, and the change of the secondary feature codes does not cause the change of the sample categories; namely: front m of division z1Dimension is zKFAfter m is2Dimension zSFAnd m is equal to m1+m2Sample xminIs steganographically coded zminIs shown as follows:
zmin=[zmin,KF,zmin,SF]
=Enc(xmin)
wherein z ismin,KFIs xminCoding of the important features of (2), zmin,SFIs xminEncoding the secondary features of (3).
103, obtaining a variant implicit code of the input sample by an implicit code reconstruction technology, and generating a plurality of reliable similar variant samples considering the important characteristics of the input sample by the reduction of a decoder and the mutual information constraint and the countermeasure of a discriminator, wherein the method specifically comprises the following steps:
based on the sample implicit codes obtained in the step 102 and the important feature codes and the secondary feature codes thereof, keeping the important feature codes unchanged, and randomly replacing the secondary feature codes to generate similar samples with the same characteristics as the given sample;
1) the decoder recovery process of the variant implicit coding and the variant samples is calculated as follows:
wherein,in order to obtain a variant-latent code,is composed ofObtaining a variant sample;sampling from its a priori normal distribution N (0, I);
in order to make the important feature codes correspond to some important attributes of the samples, a mutual information maximization constraint of a few types of samples and the important feature codes thereof needs to be introduced on the basis of the optimization target in the step (2), namely, the important feature codes of the samples can be reversely deduced through a mutual information reasoning model Q, and an optimization target L of the mutual informationInfConference is as follows:
wherein beta is1Is a hyper-parameter, z ″)min,KFIs a variant sampleThe important feature of (2) is coded in a hidden way, and the model Q is replaced by a coder Enc; z'min,KFFor reconstructing samplesCoding of important features by reverse reasoning, i.e.z″min,KFIs a variant sampleCoding of important features by reverse reasoning, i.e.
2) Modifying the optimization objective L of GAN in step (2) in order to generate reliable variant samplesGANThe method comprises the following two parts:
wherein L isGenTo optimize the generator, LDisAn optimization objective for the arbiter; gamma ray1、γ2、γ3、γ4Is a hyper-parameter;
3) for each minority class sample x in the datasetminObtaining variant latent codes by applying the process; after the traversal is finished, combining the obtained variant sample with the original minority sample to obtain a new minority sample setSampling until the number of the samples is consistent with that of most samples to obtain a balanced data set
based on the balanced data set obtained in step 103Z for arbitrarily taking a batch sample in a training cyclemaj,KFAnd Zmin,KFWherein Z ismaj,KFA feature set formed by latent coding of important features of all the samples of the plurality of classes in the training batch, Zmin,KFA feature set formed by the hidden coding of the important features of all the few classes of samples in the training batch; characteristic repulsive force Lfeature_forceIs calculated as follows:
dloss=Nearest_Neighbor_distance(Zmin,KF,Zmaj,KF,n)
wherein, the Nearest _ Neighbor _ distance () is a distance loss calculation function, and the function thereof is as follows: finding zmin,KFAt Zmaj,KFN nearest neighbors, n is 20, zmin,KF∈Zmin,KFAnd returning all Zmin,KFThe resulting set of average distances dloss;Lfeature_classIs the class characteristic repulsion loss, defines the class information of the important characteristic code first dimension representation sample, rho is a hyperparameter, is the characteristic repulsion label of the minority class sample- ρ is the characteristic repulsive force signature of most classes of samples; mu.s1And mu2Is a hyper-parameter.
105, superposing the reconstruction error of each dimension of the sample as the supplement of the important feature code by using a mixed coding technology, judging the classification result of the sample to be detected under each two-class data set according to the supplement, and obtaining the fault category of the sample by hard voting, wherein the steps are as follows:
based on the cyclic training model from step 101 to step 104, a reliable balanced data set can be obtainedCombining important feature codes of the samples and reconstruction errors e of each dimension to serve as new sample features for distinguishing sample categories; for an input sample x, its structural features are denoted as F ═ zKF,e]And e is calculated as follows:
inputting the F and the corresponding class label into a random forest classifier for training to obtain a single unbalanced two classifier; repeating the above process can obtain 11 random forest classifiers RFjJ is subscript of random forest classifier, j belongs to [1,11 ]](ii) a For the sample x to be measuredtestWhich predicts the labelIs calculated as follows:
FIG. 2 is a schematic flow chart of the method for solving the problem of fault classification of the intelligent electric meters, and the method comprises the steps of firstly dividing fault history data sets of the intelligent electric meters in different classes to obtain a plurality of two-class data sets, respectively applying the VAE-SLGAN to balance sample sets to obtain a plurality of balanced two-class data sets, and respectively training a random forest two-classifier; and for the sample to be detected, obtaining classifiers for discrimination by using the two types of data sets respectively, and obtaining the final prediction category by a soft voting method.
FIG. 3 is a detailed schematic diagram of the algorithm of the present invention, which proposes a steganographic reconstruction technique based on a fusion model of VAE and GAN, for solving the problem of mode collapse occurring in the sample generation process; a feature repulsion technology and a hybrid coding technology are designed to solve the problems of difficulty in feature mining and classification of overlapped regions.
Algorithm 1 is a pseudo code of the generated number calculation method for each minority class of samples:
algorithm 2 is an algorithm pseudo code of the sample global generation method:
algorithm 3 is an algorithm pseudo code for feature repulsion calculation:
in the specific embodiment, the test is carried out by using fault history data sets of the intelligent electric meters under different types, and the data sets collect intelligent electric meter data of 11 fault types in 25 provinces in total. Due to the influence of various factors such as manual statistics and external conditions, the data labels have the conditions of wrong label, missing label and the like, and 15885 pieces of fault sample data in total are obtained after data cleaning is carried out through various technologies such as cluster analysis, missing value completion, abnormal value processing and the like. To reduce the randomness of the results, the data set was randomly divided into a training set and a test set using a ratio of 8:2 with fixed random numbers.
TABLE 1 data set used in the specific examples
In order to verify the effectiveness of the proposed algorithm, 8 mainstream oversampling methods and 5 mainstream deep learning sample generation methods are used for comparison in the embodiment of the present invention, as shown in table 2. Considering the characteristics of large data volume and complex category of the sample of the intelligent electric meter fault data set, RF is used as a classifier to verify the sample balance effect. The inventive examples are represented in the tables by VAE-SLGAN.
TABLE 2 algorithm for comparison in the specific examples
The embodiment of the invention uses macro-F1 and G-mean indexes to evaluate the classification effect of the algorithm. The macro-F1 index is the arithmetic mean of F1-measure of each category and is used for comprehensively evaluating the accuracy and the recall rate of the model on each category. The G-mean index is a geometric mean value of the recall rate of each category and is used for evaluating the recall condition of the model on each category. The values of Macro-F1 and G-mean are from 0 to 1, and the larger the value is, the better the classification performance of the model is.
The embodiment of the invention and the main flow sampling method are shown in the table 3 for the experimental result pairs on the F1-measure and macro-F1 indexes under each fault category of the intelligent electric meter, and in the table 4 for the experimental result pairs on the recall ratio and the G-mean index of each category. It can be seen that the intelligent electric meter fault classification method based on sample global rebalancing of the invention obtains F1-measure and recall rate which exceed those of other methods in most categories, and obtains the highest macro-F1 and G-mean. The results of table 3 and table 4 are combined to show that the balance effect of the fault sample of the intelligent ammeter is better than that of the oversampling method, and higher classification accuracy and recall rate can be obtained.
Table 3 experimental results of VAE-SLGAN and oversampling method on F1-measure and macro-F1 indexes under each fault category of intelligent electric meter
Table 4 experimental results of VAE-SLGAN and oversampling method on recall rate and G-mean index of intelligent electric meter under each fault category
The embodiment of the invention and the mainstream deep learning generation method are shown in the table 5 of the experimental result pair ratio of F1-measure and macro-F1 indexes under each fault category of the intelligent electric meter, and are shown in the table 6 of the experimental result pair ratio of recall rate and G-mean index of each category. It can be seen that the intelligent electric meter fault classification method based on sample global rebalancing of the invention obtains F1-measure and recall rate which exceed those of other methods in most categories, and obtains the highest macro-F1 and G-mean. The results of table 5 and table 6 are combined to show that the balance effect of the intelligent electric meter fault sample is better than that of the deep learning generation method, and higher classification accuracy and recall rate can be obtained.
Table 5 experimental results of VAE-SLGAN and deep learning generation method on F1-measure and macro-F1 indexes under each fault category of intelligent electric meter
Table 6 experimental results of VAE-SLGAN and deep learning generation method on recall rate and G-mean index of intelligent electric meter under each fault category
Compared with a main flow sampling method and a deep learning sample generation method, a large number of comparison results show that the method can reliably balance samples when dealing with the problem of multi-classification of the intelligent electric meter caused by the fact that multiple distribution modes exist in classes and data among the classes have characteristic overlapping, provide characteristics beneficial to mining class differences for classifier learning, and effectively improve the accuracy and recall rate of classification of the fault electric meter.
In summary, the embodiments of the present invention have the following beneficial effects:
in the technical scheme, fault historical data of different types of intelligent electric meters are used as input data sets, and a plurality of secondary data sets are obtained through division; constructing a fusion model of VAE and GAN for each two-class data set, respectively taking each sample as the input of the model, and dividing the hidden codes into important feature codes and secondary feature codes; obtaining variant implicit codes of the samples by an implicit code reconstruction technology, and generating a plurality of reliable similar variant samples considering the important characteristics of input samples through the reduction of a decoder and the countermeasures of mutual information constraint and a discriminator; designing a characteristic repulsion technology acting between two types of sample implicit codes to carry out supervised characteristic representation learning; and superposing the reconstruction errors of all dimensions of the sample by a mixed coding technology to supplement the important feature codes, judging the classification result of the sample to be detected under each two-class data set according to the judgment result, and obtaining the fault category of the sample by hard voting.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (5)
1. A smart meter fault classification method based on sample global rebalancing is characterized by comprising the following steps:
(1) taking fault historical data of the intelligent electric meters in different categories as an input data set, and dividing to obtain a plurality of secondary data sets, specifically:
inputting an actual fault data set of the intelligent electric meter, wherein samples in the data set comprise 9 characteristic variables of working duration, a number of a lot to be delivered, a power supply unit number, a type of the electric energy meter, a fault identification month, an installation month, a province, an equipment specification, a communication mode and an equipment identifier; the fault category labels of the fault category labels comprise 11 types of appearance faults, metering faults, storage unit faults, processing unit faults, display unit faults, control unit faults, power supply unit faults, communication unit faults, clock unit faults, other faults and software faults; traversing each type of sample in the fault data set, taking all samples in the type as a few type of sample sets, taking all samples in other types as a majority type of sample set, and converting the original data set into 11 second type data sets; for each of the two types of data sets, the data set can be described as:
X=[Xmin,Xmaj]
wherein X is a class II data set, and X is defined as any sample in the data set, namely X belongs to X; xminFor a minority class sample set, define xminFor any sample in the dataset, i.e. xmin∈Xmin;XmajFor most sample sets, define xmajFor any sample in the data set, i.e. xmaj∈Xmaj;
(2) Constructing a fusion model of VAE and GAN aiming at each two-class data set, respectively taking each sample as the input of the model, and dividing the hidden codes into important feature codes and secondary feature codes, wherein the method specifically comprises the following steps:
based on the unbalanced binary data set X ═ X obtained in step (1)min,Xmaj]Building a VAE/GAN model and training:
the VAE learns the implicit posterior distribution q (z | x) of the sample x by the encoder and the distribution p (x | z) of the sample x by the decoder as follows:
z~Enc(x)=q(z|x)
where Enc is the encoder, Dec is the decoder, z is the implicit code obtained by subjecting sample x to a VAE encoder,reconstructed samples generated for VAE;
fusing a decoder of the VAE with a generator of the GAN to obtain a fusion model of the VAE and the GAN, wherein the optimization goal is as follows:
wherein alpha is1,α2Is hyperparametric, LpriorThe prior loss due to the difference in the distribution of q (z | x) and N (0, I), where N (0, I) is a normal distribution with a mean of 0 and a variance of 1, and DKL() Is KL divergence; l islikelihoodObtaining a likelihood loss, L, for p (x | z) for sampling q (z | x)GANIs the antagonism loss of GAN; z is a radical of formulaminIs xminLatent coding of (b), zmajIs xmajHidden coding of (2); e2]Dis is an identifier for the expected calculation function;
based on the optimization target, m-dimensional hidden codes z containing the characteristics of a given sample x can be obtained; the method is characterized in that hidden codes of a VAE model to a sample x are divided into important feature codes z in the training processKFAnd secondary feature coding zSFTwo parts; the important feature codes correspond to some important attributes of the samples, are high-level feature representations of the samples and are key features for determining sample categories, and the secondary feature codes are personalized attributes of the samples, generally have no generality, and the change of the secondary feature codes does not cause the change of the sample categories; namely: dividing the front m of z1Dimension zKFAfter m2Dimension is zSFAnd m is equal to m1+m2Sample xminIs steganographically coded zminIs shown as follows:
zmin=[zmin,KF,zmin,SF]
=Enc(xmin)
wherein z ismin,KFIs xminIs coded for the important feature, zmin,SFIs xminEncoding the secondary features of (1);
(3) obtaining variant hidden codes of the samples by a hidden code reconstruction technology, reducing by a decoder, restraining mutual information and confronting by a discriminator to generate a plurality of reliable similar variant samples considering the important characteristics of input samples, which specifically comprises the following steps:
based on the sample implicit codes obtained in the step (2) and the important feature codes and the secondary feature codes thereof, keeping the important feature codes unchanged, and randomly replacing the secondary feature codes to generate similar samples with the same characteristics as the given samples;
1) the decoder recovery process of the variant implicit code and the variant samples is calculated as follows:
wherein,in order to obtain a variant-latent code,is made ofObtaining a variant sample;sampling from its a priori normal distribution N (0, I);
in order to make the important feature codes correspond to some important attributes of the samples, a mutual information maximization constraint of a few types of samples and the important feature codes thereof needs to be introduced on the basis of the optimization target in the step (2), namely, the important feature codes of the samples can be reversely deduced through a mutual information reasoning model Q, and an optimization target L of the mutual informationInforenceThe following:
wherein, beta1Is a hyper-parameter, z ″)min,KFIs a variant sampleThe important feature of (2) is coded in a hidden way, and the model Q is replaced by a coder Enc; z'min,KFFor reconstructing samplesCoding of important features by reverse reasoning, i.e.z″min,KFIs a variant sampleCoding of important features by reverse reasoning, i.e.
2) Modifying the optimization objective L of GAN in step (2) in order to generate reliable variant samplesGANThe method comprises the following two parts:
wherein L isGenTo optimize the generator, LDisIs an optimization target of the discriminator; gamma ray1、γ2、γ3、γ4Is a hyper-parameter;
3) for each minority class sample x in the datasetminObtaining variant implicit codes by applying the process; after the traversal is finished, combining the obtained variant sample with the original minority sample to obtain a new minority sample setSampling until the number of the samples is consistent with that of most samples to obtain a balanced data set
(4) A characteristic repulsion technology acting between two types of sample implicit codes is designed for supervised characteristic representation learning, and the method specifically comprises the following steps:
based on the balance number obtained in step (3)Data setZ for arbitrarily taking a batch of samples in a training cyclemaj,KFAnd Zmin,KFWherein Z ismaj,KFA feature set consisting of all significant feature steganographic codes of a plurality of types of samples in the training batch, Zmin,KFA feature set formed by the hidden codes of the important features of all the few classes of samples in the training batch; characteristic repulsive force Lfeature_forceIs calculated as follows:
dloss=Nearest_Neighbor_distance(Zmin,KF,Zmaj,KF,n)
wherein, the Nearest _ Neighbor _ distance () is a distance loss calculation function, and the function thereof is as follows: finding zmin,KFAt Zmaj,KFN nearest neighbors, n is 20, zmin,KF∈Zmin,KFAnd returning all Zmin,KFThe resulting set of average distances dloss;Lfeature_classDefining important feature code first dimension to represent class information of samples, wherein rho is a super parameter and is a feature repulsion label of a few types of samples, and-rho is a feature repulsion label of a plurality of types of samples; mu.s1And mu2Is a hyper-parameter;
(5) and superposing the reconstruction error of each dimension of the sample by a mixed coding technology to supplement the important feature codes, judging the classification result of the sample to be detected under each two-class data set according to the judgment result, and obtaining the fault category of the sample by hard voting, wherein the method specifically comprises the following steps of:
based on the cyclic training model of the steps (1) to (4), a reliable balance data set can be obtainedCombining important feature codes of the samples and reconstruction errors e of each dimension to serve as new sample features for distinguishing sample categories; for an input sample x, its structural features are denoted as F ═ zKF,e]E is calculated as follows:
inputting the F and the corresponding class label into a random forest classifier for training to obtain a single unbalanced two classifier; repeating the above process can obtain 11 random forest classifiers RFjJ is subscript of random forest classifier, j belongs to [1,11 ]](ii) a For the sample x to be measuredtestWhich predict the labelIs calculated as follows:
2. The method for classifying the faults of the smart meter based on the sample global rebalancing according to claim 1, wherein in the step (2), the structures of the encoder Enc, the decoder Dec and the discriminator Dis are as follows:
wherein Reshape () is a tensor warping function; conv1D () is a one-dimensional convolutional layer building function; flatten () is a tensor flattening function; dense () is the full connection layer building function; LeakyRelu and tanh represent the corresponding activation functions; dropout () is a random deactivation function.
3. The method for classifying faults of intelligent electric meters based on sample global rebalancing according to claim 1, wherein in the step (2), a1And alpha2Are respectively 0.1 and 1; m is a unit of1Is 5, m2Is 1.
4. The method for classifying faults of intelligent electric meters based on sample global rebalancing according to claim 1, wherein in the step (3), β is1Is 1; gamma ray1、γ2All values of (a) are 0.5, gamma3、γ4All values of (A) are 1.
5. The intelligent electric meter fault classification method based on the sample global rebalancing of claim 1, wherein in the step (4), the value of p is 0.4; mu.s1And mu2The values of (a) are 1 and 2, respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210348671.1A CN114781495A (en) | 2022-04-01 | 2022-04-01 | Intelligent ammeter fault classification method based on sample global rebalancing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210348671.1A CN114781495A (en) | 2022-04-01 | 2022-04-01 | Intelligent ammeter fault classification method based on sample global rebalancing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114781495A true CN114781495A (en) | 2022-07-22 |
Family
ID=82426386
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210348671.1A Pending CN114781495A (en) | 2022-04-01 | 2022-04-01 | Intelligent ammeter fault classification method based on sample global rebalancing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114781495A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117092446A (en) * | 2023-10-20 | 2023-11-21 | 国网山东省电力公司嘉祥县供电公司 | Power transmission line fault detection method and system |
-
2022
- 2022-04-01 CN CN202210348671.1A patent/CN114781495A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117092446A (en) * | 2023-10-20 | 2023-11-21 | 国网山东省电力公司嘉祥县供电公司 | Power transmission line fault detection method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110609759B (en) | Fault root cause analysis method and device | |
CN109002933B (en) | Distribution line route variable relation model optimization method based on Relieff and t-SNE | |
CN108985380B (en) | Point switch fault identification method based on cluster integration | |
CN105184394B (en) | Optimal control method based on CPS online data mining of power distribution network | |
CN106845717A (en) | A kind of energy efficiency evaluation method based on multi-model convergence strategy | |
CN111967512A (en) | Abnormal electricity utilization detection method, system and storage medium | |
CN103455563A (en) | Data mining method applicable to integrated monitoring system of intelligent substation | |
CN112559963A (en) | Power distribution network dynamic parameter identification method and device | |
CN110378744A (en) | Civil aviaton's frequent flight passenger value category method and system towards incomplete data system | |
CN107632590A (en) | A kind of bottom event sort method based on priority | |
CN110297469A (en) | The production line fault judgment method of Ensemble feature selection algorithm based on resampling | |
CN114781495A (en) | Intelligent ammeter fault classification method based on sample global rebalancing | |
CN115373879A (en) | Intelligent operation and maintenance disk fault prediction method for large-scale cloud data center | |
CN118244191B (en) | Uninterrupted power supply's electric power collection metering system | |
CN102374936A (en) | Mechanical failure diagnostic method based on complex immune network algorithm | |
CN113922412A (en) | Panorama evaluation method and system for new energy multi-station short circuit ratio, storage medium and computing device | |
CN104899101A (en) | Dynamic distributing method of software testing resources based on multi-object difference evolutionary algorithm | |
CN110348540A (en) | Electrical power system transient angle stability Contingency screening method and device based on cluster | |
CN113887623A (en) | IFCM-BB-based transformer fault diagnosis method | |
CN106056305A (en) | Power generation system reliability rapid assessment method based on state clustering | |
CN116543198A (en) | Smart electric meter fault classification method based on multi-granularity neighbor graphs | |
CN114298413B (en) | Hydroelectric generating set runout trend prediction method | |
Lai et al. | Missing value imputations by rule-based incomplete data fuzzy modeling | |
Shan et al. | Root Cause Analysis of Failures for Power Communication Network Based on CNN | |
CN108898264B (en) | Method and device for calculating quality metric index of overlapping community set |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |