CN114781495A - Intelligent ammeter fault classification method based on sample global rebalancing - Google Patents

Intelligent ammeter fault classification method based on sample global rebalancing Download PDF

Info

Publication number
CN114781495A
CN114781495A CN202210348671.1A CN202210348671A CN114781495A CN 114781495 A CN114781495 A CN 114781495A CN 202210348671 A CN202210348671 A CN 202210348671A CN 114781495 A CN114781495 A CN 114781495A
Authority
CN
China
Prior art keywords
sample
samples
codes
data set
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210348671.1A
Other languages
Chinese (zh)
Inventor
贾欣
高欣
薛冰
黄子健
傅世元
孟之航
黄旭
张光耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202210348671.1A priority Critical patent/CN114781495A/en
Publication of CN114781495A publication Critical patent/CN114781495A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R35/00Testing or calibrating of apparatus covered by the other groups of this subclass
    • G01R35/04Testing or calibrating of apparatus covered by the other groups of this subclass of instruments for measuring time integral of power or current
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The embodiment of the invention provides a sample global rebalancing-based intelligent electric meter fault classification method, which comprises the following steps: dividing fault history data under different categories of the intelligent electric meters to obtain a plurality of second-class data sets; constructing a fusion model of VAE and GAN aiming at each class II data set, respectively taking each sample as the input of the model, and dividing the hidden codes of the samples into important feature codes and secondary feature codes; obtaining variant implicit codes of the samples by an implicit code reconstruction technology, and generating a plurality of reliable similar variant samples considering the important characteristics of input samples through the reduction of a decoder and the countermeasures of mutual information constraint and a discriminator; designing a characteristic repulsion technology acting between two types of sample implicit codes to carry out supervised characteristic representation learning; and superposing the reconstruction errors of all dimensions of the sample by a mixed coding technology to supplement the important feature codes, judging the classification result of the sample to be detected under each two-class data set according to the judgment result, and obtaining the fault category of the sample by hard voting.

Description

Intelligent ammeter fault classification method based on sample global rebalancing
[ technical field ] A
The invention relates to a fault classification method for an intelligent electric meter, in particular to a fault classification method for an intelligent electric meter based on sample global rebalancing.
[ background ] A method for producing a semiconductor device
Since the 21 st century, the intelligent electric meter is popularized and applied in the field of power utilization information collection on a large scale, the electric power industry enters a big data era, the intelligent electric meter is used as a terminal device under the current intelligent power grid construction in China, and the intelligent electric meter integrates multiple functions of metering, displaying, communication and the like, and plays an important supporting role in stable operation of a power grid. Along with the development of the smart power grid, the functions of the smart power meter are increasingly enriched, the fault types of the smart power meter are also increasingly increased, and the timely maintenance of the fault power meter has great significance for the stable operation of the power grid and the stable power utilization of users. However, because the operation and maintenance personnel often only have the maintenance experience for some specific faulty meters, when the smart meter fails, it is difficult to dispatch the maintenance personnel with the relevant maintenance experience to repair the meter without knowing the fault type of the meter. Therefore, accurate prediction of the fault type of the smart meter is crucial.
With the development of a smart power grid, the demand on a smart electric meter is increased day by day, smart electric meter manufacturers increase, design schemes, component types and process flows of different manufacturers are different, and a complex production environment and an application environment are added, so that the fault types of the smart electric meter are complex and various, the fault occurrence is influenced by the factors, and the finding of the mapping relation between the factors and the fault types is extremely complex, which is a typical machine learning problem. In addition, the occurrence frequency of different fault types of electric meters is different, fault data show multi-mode distribution characteristics, and therefore the prediction difficulty of the intelligent electric meter is increased. Learning directly based on unbalanced data can result in a shift of the classifier's prediction results to a majority class. The balance among classes can be realized by a data sampling means, but the method lacks a guarantee mechanism for the authenticity of a synthesized sample; the generated countermeasure network can improve the authenticity of the generated samples through the countermeasure game of the generator and the discriminator, but the problem of mode collapse exists, and the consistency of the distribution of a few types of samples before and after balance cannot be ensured. Based on the analysis, the invention provides a smart meter fault classification method based on sample global rebalancing so as to improve the performance of smart meter fault classification.
[ summary of the invention ]
In view of this, the invention provides a method for classifying faults of an intelligent electric meter based on sample global rebalancing, so as to improve the performance of fault classification of the intelligent electric meter.
The invention provides a sample global rebalancing-based intelligent electric meter fault classification method, which comprises the following steps of:
(1) taking fault historical data of the intelligent electric meters in different categories as an input data set, and dividing to obtain a plurality of secondary data sets, specifically:
inputting an actual fault data set of the intelligent electric meter, wherein samples in the data set comprise 9 characteristic variables of working duration, arrival batch number, power supply unit number, electric energy meter category, fault identification month, installation month, province, equipment specification, communication mode and equipment identification; the fault category labels of the system comprise 11 types of appearance faults, metering faults, storage unit faults, processing unit faults, display unit faults, control unit faults, power supply unit faults, communication unit faults, clock unit faults, other faults and software faults; traversing each type of sample in the fault data set, taking all samples in the type as a few type of sample sets, taking all samples in other types as a majority type of sample set, and converting the original data set into 11 second type data sets; for each of the two types of data sets, the data set can be described as:
X=[Xmin,Xmaj]
wherein X is a class II data set, and X is defined as any sample in the data set, namely X belongs to X; xminFor a minority class sample set, define xminFor any sample in the dataset, i.e. xmin∈Xmin;XmajFor most classes of sample sets, define xmajFor any sample in the data set, i.e. xmaj∈Xmaj
(2) Constructing a fusion model of VAE and GAN aiming at each two-class data set, respectively taking each sample as the input of the model, and dividing the hidden codes into important feature codes and secondary feature codes, wherein the method specifically comprises the following steps:
based on the unbalanced binary data set X ═ X obtained in the step (1)min,Xmaj]Building a VAE/GAN model and training:
the VAE learns the implicit posterior distribution q (z | x) of the sample x by the encoder and the distribution p (x | z) of the sample x by the decoder as follows:
z~Enc(x)=q(z|x)
Figure BDA0003578221250000031
where Enc is the encoder, Dec is the decoder, z is the implicit code obtained by VAE encoding samples x,
Figure BDA0003578221250000032
reconstructed samples generated for VAE;
and fusing a decoder of the VAE with a generator of the GAN to obtain a fusion model of the VAE and the GAN, wherein the optimization goal is as follows:
Figure BDA0003578221250000033
Figure BDA0003578221250000034
Figure BDA0003578221250000035
wherein alpha is1,α2Is a root of Chao ShenNumber, LpriorThe prior loss due to the difference in the distribution of q (z | x) and N (0, I), where N (0, I) is a normal distribution with a mean of 0 and a variance of 1, and DKL() Is KL divergence; l is a radical of an alcohollikelihoodObtaining a likelihood loss, L, for p (x | z) for sampling q (z | x)GANIs the antagonism loss of GAN; z is a radical ofminIs xminIs implicitly coded, zmajIs xmajHidden coding of (2); e2]For the expected computation function, Dis is the discriminator;
based on the optimization target, m-dimensional hidden codes z containing the characteristics of a given sample x can be obtained; the method is characterized in that the hidden coding of a VAE model to a sample x is divided into important feature codes z in the training processKFAnd secondary feature coding zSFTwo parts; the important feature codes correspond to some important attributes of the samples, are high-level feature representations of the samples and are key features for determining sample categories, and the secondary feature codes are personalized attributes of the samples, generally have no generality, and the change of the secondary feature codes does not cause the change of the sample categories; namely: front m of division z1Dimension zKFAfter m2Dimension is zSFAnd m is equal to m1+m2Sample xminIs steganographically coded zminIs shown as follows:
zmin=[zmin,KF,zmin,SF]
=Enc(xmin)
wherein z ismin,KFIs xminIs coded for the important feature, zmin,SFIs xminEncoding the secondary features of (1);
(3) obtaining variant implicit codes of the samples by an implicit code reconstruction technology, and generating a plurality of reliable similar variant samples considering the important characteristics of input samples by the reduction of a decoder, mutual information constraint and the confrontation of a discriminator, wherein the method specifically comprises the following steps:
based on the sample implicit codes obtained in the step (2) and the important feature codes and the secondary feature codes thereof, keeping the important feature codes unchanged, and randomly replacing the secondary feature codes to generate similar samples with the same characteristics as the given sample;
1) the decoder recovery process of the variant implicit code and the variant samples is calculated as follows:
Figure BDA0003578221250000041
Figure BDA0003578221250000042
wherein,
Figure BDA0003578221250000043
in order to obtain a variation-latent code,
Figure BDA0003578221250000044
is composed of
Figure BDA0003578221250000045
Obtaining a variant sample;
Figure BDA0003578221250000046
sampling from its a priori normal distribution N (0, I);
in order to make the important feature codes correspond to some important attributes of the samples, a mutual information maximization constraint of a few types of samples and the important feature codes thereof needs to be introduced on the basis of the optimization target in the step (2), namely, the important feature codes of the samples can be reversely deduced through a mutual information reasoning model Q, and an optimization target L of the mutual informationInforenceThe following:
Figure BDA0003578221250000047
wherein beta is1Is a hyperparameter, z ″)min,KFIs a variant sample
Figure BDA0003578221250000048
The model Q is replaced by an encoder Enc; z'min,KFFor reconstructing samples
Figure BDA0003578221250000049
Coding of important features by reverse reasoning, i.e.
Figure BDA00035782212500000410
z″min,KFIs a variant sample
Figure BDA00035782212500000411
Coding of important features by reverse reasoning, i.e.
Figure BDA00035782212500000412
2) Modifying the optimization target L of GAN in step (2) in order to generate reliable variant samplesGANThe method comprises the following two parts:
Figure BDA0003578221250000051
Figure BDA0003578221250000052
wherein L isGenTo optimize the goal of the generator, LDisIs an optimization target of the discriminator; gamma ray1、γ2、γ3、γ4Is a hyper-parameter;
3) for each minority class sample x in the datasetminObtaining variant latent codes by applying the process; after the traversal is finished, combining the obtained variant sample with the original minority sample to obtain a new minority sample set
Figure BDA0003578221250000053
Sampling until the number of the samples is consistent with that of most samples to obtain a balanced data set
Figure BDA0003578221250000054
(4) A characteristic repulsive force technology acting between two types of sample implicit codes is designed to carry out supervised characteristic representation learning, and the method specifically comprises the following steps:
based onThe balance data set obtained in the step (3)
Figure BDA0003578221250000055
Z for arbitrarily taking a batch of samples in a training cyclemaj,KFAnd Zmin,KFWherein Z ismaj,KFA feature set formed by latent coding of important features of all the samples of the plurality of classes in the training batch, Zmin,KFA feature set formed by the hidden codes of the important features of all the few classes of samples in the training batch; characteristic repulsive force Lfeature_forceIs calculated as follows:
dloss=Nearest_Neighbor_distance(Zmin,KF,Zmaj,KF,n)
Figure BDA0003578221250000056
Figure BDA0003578221250000057
wherein, the Nearest _ Neighbor _ distance () is a distance loss calculation function, and the function thereof is as follows: looking for zmin,KFAt Zmaj,KFN nearest neighbors in (1), wherein n is 20, zmin,KF∈Zmin,KFAnd return all Zmin,KFThe obtained average distance set dloss;Lfeature_classDefining important feature code first dimension to represent class information of samples, wherein rho is a super parameter and is a feature repulsion label of a few types of samples, and-rho is a feature repulsion label of a plurality of types of samples; mu.s1And mu2Is a hyper-parameter;
(5) and superposing the reconstruction error of each dimension of the sample by a mixed coding technology to supplement the important feature codes, judging the classification result of the sample to be detected under each two-class data set according to the judgment result, and obtaining the fault category of the sample by hard voting, wherein the method specifically comprises the following steps of:
based on the cyclic training model of the steps (1) to (4), a reliable balanced data set can be obtained
Figure BDA0003578221250000058
Combining important feature codes of the samples and reconstruction errors e of each dimension to serve as new sample features for distinguishing sample categories; for an input sample x, its structural features are denoted as F ═ zKF,e]E is calculated as follows:
Figure BDA0003578221250000061
inputting the F and the corresponding class label into a random forest classifier for training to obtain a single unbalanced two classifier; repeating the above process can obtain 11 random forest classifiers RFjJ is subscript of random forest classifier, j belongs to [1,11 ]](ii) a For the sample x to be measuredtestWhich predicts the label
Figure BDA0003578221250000062
Is calculated as follows:
Figure BDA0003578221250000063
Figure BDA0003578221250000064
when j is a value, x is representedtestThe predicted failure category of (1) is a jth failure.
In the above method, in the step (2), the structures of the encoder Enc, the decoder Dec and the discriminator Dis are as follows:
Figure BDA0003578221250000065
where Reshape () is a tensor warping function; conv1D () is a one-dimensional convolutional layer building function; flatten () is a tensor flattening function; dense () is the full connection layer build function; LeakyRelu and tanh represent the corresponding activation functions; dropout () is a random deactivation function.
In the above method, the step (2)In alpha1And alpha2Are respectively 0.1 and 1; m is1Is 5, m2Is 1.
In the above process, in the step (3), β1Is 1; gamma ray1、γ2All values of (a) are 0.5, gamma3、γ4The values of (A) are all 1.
In the method, in the step (4), ρ is 0.4; mu.s1And mu2The values of (a) are 1 and 2, respectively.
According to the intelligent electric meter fault classification method based on sample global rebalancing, the accuracy and the recall rate of intelligent electric meter fault classification are improved.
According to the technical scheme, the invention has the following beneficial effects:
in the technical scheme implemented by the invention, the sample generation aiming at the specified characteristics is realized through redefining the VAE hidden code, the problems of data authenticity and overfitting caused by a data oversampling algorithm are solved, and the problem of mode running existing in the conventional sample generation algorithm is improved; the two modes of designing a characteristic repulsion technology and a mixed coding technology relieve the problem of difficult characteristic extraction and classification of samples in an overlapping area. The method can improve the robustness of the model, so that the accuracy and the recall rate of fault classification of the intelligent electric meter are improved.
[ description of the drawings ]
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor.
FIG. 1 is a schematic flow chart of a framework of a fault classification method for an intelligent electric meter based on sample global rebalancing, which is provided by the invention;
FIG. 2 is a schematic flow chart of a fault classification of the smart meter;
fig. 3 is a detailed schematic of the algorithm of the present invention.
[ detailed description ] A
In order to better understand the technical scheme of the invention, the invention is described in detail below with reference to the accompanying drawings.
It should be understood that the described embodiments of the invention are only some of the embodiments of the invention, and not all of them. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a method for classifying faults of an intelligent ammeter based on sample global rebalancing. In order to meet the fault classification of the intelligent electric meter, the invention disassembles the complex unbalanced multi-classification problem into a plurality of unbalanced two-classification problems, designs a hidden coding reconstruction technology to realize the generation of reliable variation samples considering the key characteristics of given samples, and obtains a plurality of balanced two-class data sets; the two modes of a characteristic repulsion technology and a mixed coding technology are designed to relieve the problem of difficult characteristic extraction and classification of samples in an overlapping area; and finally, integrating the classification results of the samples to be detected under each two-class data set, and obtaining the fault classes of the samples to be detected in a hard voting mode.
Fig. 1 is a schematic flow chart of a framework of a method for classifying faults of an intelligent electric meter based on sample global rebalancing, which includes the following steps:
step 101, taking fault history data of different types of smart electric meters as input data sets, and dividing the input data sets to obtain a plurality of two types of data sets, specifically:
inputting an actual fault data set of the intelligent electric meter, wherein samples in the data set comprise 9 characteristic variables of working duration, a number of a lot to be delivered, a power supply unit number, a type of the electric energy meter, a fault identification month, an installation month, a province, an equipment specification, a communication mode and an equipment identifier; the fault category labels of the fault category labels comprise 11 types of appearance faults, metering faults, storage unit faults, processing unit faults, display unit faults, control unit faults, power supply unit faults, communication unit faults, clock unit faults, other faults and software faults; traversing each type of sample in the fault data set, taking all samples in the type as a few type sample set, taking all samples in the other types as a majority type sample set, and converting the original data set into 11 second type data sets; for each of the two types of data sets, the data set can be described as:
X=[Xmin,Xmaj]
wherein X is a class II data set, and X is defined as any sample in the data set, namely X belongs to X; xminFor a minority class sample set, define xminFor any sample in the dataset, i.e. xmin∈Xmin;XmajFor most classes of sample sets, define xmajFor any sample in the dataset, i.e. xmaj∈Xmaj
Step 102, constructing a fusion model of the VAE and the GAN for each class II data set, respectively taking each sample as the input of the model, and dividing the hidden code of the model into an important feature code and a secondary feature code, specifically:
based on the unbalanced binary data set X ═ X obtained in step 101min,Xmaj]Building a VAE/GAN model and training:
the VAE learns the implicit variable posterior distribution q (z | x) of the sample x by the encoder and the distribution p (x | z) of the sample x by the decoder as follows:
z~Enc(x)=q(z|x)
Figure BDA0003578221250000091
where Enc is the encoder, Dec is the decoder, z is the implicit code obtained by subjecting sample x to a VAE encoder,
Figure BDA0003578221250000092
reconstructed samples generated for VAEs;
and fusing a decoder of the VAE with a generator of the GAN to obtain a fusion model of the VAE and the GAN, wherein the optimization goal is as follows:
Figure BDA0003578221250000093
Figure BDA0003578221250000094
Figure BDA0003578221250000095
wherein alpha is1,α2Is hyperparametric, LpriorThe prior loss due to the difference in the distribution of q (z | x) and N (0, I), where N (0, I) is a normal distribution with a mean of 0 and a variance of 1, and DKL() Is KL divergence; l is a radical of an alcohollikelihood sample q (z | x) to yield a likelihood loss of p (x | z), LGANIs the antagonism loss of GAN; z is a radical ofminIs xminLatent coding of (b), zmajIs xmajThe hidden coding of (2); e [ 2 ]]Dis is an identifier for the expected calculation function;
based on the optimization target, m-dimensional hidden codes z containing the characteristics of a given sample x can be obtained; the method is characterized in that the hidden coding of a VAE model to a sample x is divided into important feature codes z in the training processKFAnd secondary feature coding zSFTwo parts; the important feature codes correspond to some important attributes of the samples, are high-level feature representations of the important features and are key features for determining sample categories, and the secondary feature codes are personalized attributes of the samples, generally have no generality, and the change of the secondary feature codes does not cause the change of the sample categories; namely: front m of division z1Dimension is zKFAfter m is2Dimension zSFAnd m is equal to m1+m2Sample xminIs steganographically coded zminIs shown as follows:
zmin=[zmin,KF,zmin,SF]
=Enc(xmin)
wherein z ismin,KFIs xminCoding of the important features of (2), zmin,SFIs xminEncoding the secondary features of (3).
103, obtaining a variant implicit code of the input sample by an implicit code reconstruction technology, and generating a plurality of reliable similar variant samples considering the important characteristics of the input sample by the reduction of a decoder and the mutual information constraint and the countermeasure of a discriminator, wherein the method specifically comprises the following steps:
based on the sample implicit codes obtained in the step 102 and the important feature codes and the secondary feature codes thereof, keeping the important feature codes unchanged, and randomly replacing the secondary feature codes to generate similar samples with the same characteristics as the given sample;
1) the decoder recovery process of the variant implicit coding and the variant samples is calculated as follows:
Figure BDA0003578221250000101
Figure BDA0003578221250000102
wherein,
Figure BDA0003578221250000103
in order to obtain a variant-latent code,
Figure BDA0003578221250000104
is composed of
Figure BDA0003578221250000105
Obtaining a variant sample;
Figure BDA0003578221250000106
sampling from its a priori normal distribution N (0, I);
in order to make the important feature codes correspond to some important attributes of the samples, a mutual information maximization constraint of a few types of samples and the important feature codes thereof needs to be introduced on the basis of the optimization target in the step (2), namely, the important feature codes of the samples can be reversely deduced through a mutual information reasoning model Q, and an optimization target L of the mutual informationInfConference is as follows:
Figure BDA0003578221250000107
wherein beta is1Is a hyper-parameter, z ″)min,KFIs a variant sample
Figure BDA0003578221250000108
The important feature of (2) is coded in a hidden way, and the model Q is replaced by a coder Enc; z'min,KFFor reconstructing samples
Figure BDA0003578221250000109
Coding of important features by reverse reasoning, i.e.
Figure BDA00035782212500001010
z″min,KFIs a variant sample
Figure BDA00035782212500001011
Coding of important features by reverse reasoning, i.e.
Figure BDA00035782212500001012
2) Modifying the optimization objective L of GAN in step (2) in order to generate reliable variant samplesGANThe method comprises the following two parts:
Figure BDA0003578221250000111
Figure BDA0003578221250000112
wherein L isGenTo optimize the generator, LDisAn optimization objective for the arbiter; gamma ray1、γ2、γ3、γ4Is a hyper-parameter;
3) for each minority class sample x in the datasetminObtaining variant latent codes by applying the process; after the traversal is finished, combining the obtained variant sample with the original minority sample to obtain a new minority sample set
Figure BDA0003578221250000113
Sampling until the number of the samples is consistent with that of most samples to obtain a balanced data set
Figure BDA0003578221250000114
Step 104, designing a characteristic repulsion technology acting between two types of sample hidden codes for supervised characteristic representation learning, specifically:
based on the balanced data set obtained in step 103
Figure BDA0003578221250000115
Z for arbitrarily taking a batch sample in a training cyclemaj,KFAnd Zmin,KFWherein Z ismaj,KFA feature set formed by latent coding of important features of all the samples of the plurality of classes in the training batch, Zmin,KFA feature set formed by the hidden coding of the important features of all the few classes of samples in the training batch; characteristic repulsive force Lfeature_forceIs calculated as follows:
dloss=Nearest_Neighbor_distance(Zmin,KF,Zmaj,KF,n)
Figure BDA0003578221250000116
Figure BDA0003578221250000117
wherein, the Nearest _ Neighbor _ distance () is a distance loss calculation function, and the function thereof is as follows: finding zmin,KFAt Zmaj,KFN nearest neighbors, n is 20, zmin,KF∈Zmin,KFAnd returning all Zmin,KFThe resulting set of average distances dloss;Lfeature_classIs the class characteristic repulsion loss, defines the class information of the important characteristic code first dimension representation sample, rho is a hyperparameter, is the characteristic repulsion label of the minority class sample- ρ is the characteristic repulsive force signature of most classes of samples; mu.s1And mu2Is a hyper-parameter.
105, superposing the reconstruction error of each dimension of the sample as the supplement of the important feature code by using a mixed coding technology, judging the classification result of the sample to be detected under each two-class data set according to the supplement, and obtaining the fault category of the sample by hard voting, wherein the steps are as follows:
based on the cyclic training model from step 101 to step 104, a reliable balanced data set can be obtained
Figure BDA0003578221250000121
Combining important feature codes of the samples and reconstruction errors e of each dimension to serve as new sample features for distinguishing sample categories; for an input sample x, its structural features are denoted as F ═ zKF,e]And e is calculated as follows:
Figure BDA0003578221250000122
inputting the F and the corresponding class label into a random forest classifier for training to obtain a single unbalanced two classifier; repeating the above process can obtain 11 random forest classifiers RFjJ is subscript of random forest classifier, j belongs to [1,11 ]](ii) a For the sample x to be measuredtestWhich predicts the label
Figure BDA0003578221250000123
Is calculated as follows:
Figure BDA0003578221250000124
Figure BDA0003578221250000125
when j is a value, x is representedtestThe predicted failure category of (1) is a jth failure.
FIG. 2 is a schematic flow chart of the method for solving the problem of fault classification of the intelligent electric meters, and the method comprises the steps of firstly dividing fault history data sets of the intelligent electric meters in different classes to obtain a plurality of two-class data sets, respectively applying the VAE-SLGAN to balance sample sets to obtain a plurality of balanced two-class data sets, and respectively training a random forest two-classifier; and for the sample to be detected, obtaining classifiers for discrimination by using the two types of data sets respectively, and obtaining the final prediction category by a soft voting method.
FIG. 3 is a detailed schematic diagram of the algorithm of the present invention, which proposes a steganographic reconstruction technique based on a fusion model of VAE and GAN, for solving the problem of mode collapse occurring in the sample generation process; a feature repulsion technology and a hybrid coding technology are designed to solve the problems of difficulty in feature mining and classification of overlapped regions.
Algorithm 1 is a pseudo code of the generated number calculation method for each minority class of samples:
Figure BDA0003578221250000126
Figure BDA0003578221250000131
algorithm 2 is an algorithm pseudo code of the sample global generation method:
Figure BDA0003578221250000132
Figure BDA0003578221250000141
algorithm 3 is an algorithm pseudo code for feature repulsion calculation:
Figure BDA0003578221250000151
in the specific embodiment, the test is carried out by using fault history data sets of the intelligent electric meters under different types, and the data sets collect intelligent electric meter data of 11 fault types in 25 provinces in total. Due to the influence of various factors such as manual statistics and external conditions, the data labels have the conditions of wrong label, missing label and the like, and 15885 pieces of fault sample data in total are obtained after data cleaning is carried out through various technologies such as cluster analysis, missing value completion, abnormal value processing and the like. To reduce the randomness of the results, the data set was randomly divided into a training set and a test set using a ratio of 8:2 with fixed random numbers.
TABLE 1 data set used in the specific examples
Figure BDA0003578221250000152
Figure BDA0003578221250000161
In order to verify the effectiveness of the proposed algorithm, 8 mainstream oversampling methods and 5 mainstream deep learning sample generation methods are used for comparison in the embodiment of the present invention, as shown in table 2. Considering the characteristics of large data volume and complex category of the sample of the intelligent electric meter fault data set, RF is used as a classifier to verify the sample balance effect. The inventive examples are represented in the tables by VAE-SLGAN.
TABLE 2 algorithm for comparison in the specific examples
Figure BDA0003578221250000162
The embodiment of the invention uses macro-F1 and G-mean indexes to evaluate the classification effect of the algorithm. The macro-F1 index is the arithmetic mean of F1-measure of each category and is used for comprehensively evaluating the accuracy and the recall rate of the model on each category. The G-mean index is a geometric mean value of the recall rate of each category and is used for evaluating the recall condition of the model on each category. The values of Macro-F1 and G-mean are from 0 to 1, and the larger the value is, the better the classification performance of the model is.
The embodiment of the invention and the main flow sampling method are shown in the table 3 for the experimental result pairs on the F1-measure and macro-F1 indexes under each fault category of the intelligent electric meter, and in the table 4 for the experimental result pairs on the recall ratio and the G-mean index of each category. It can be seen that the intelligent electric meter fault classification method based on sample global rebalancing of the invention obtains F1-measure and recall rate which exceed those of other methods in most categories, and obtains the highest macro-F1 and G-mean. The results of table 3 and table 4 are combined to show that the balance effect of the fault sample of the intelligent ammeter is better than that of the oversampling method, and higher classification accuracy and recall rate can be obtained.
Table 3 experimental results of VAE-SLGAN and oversampling method on F1-measure and macro-F1 indexes under each fault category of intelligent electric meter
Figure BDA0003578221250000171
Table 4 experimental results of VAE-SLGAN and oversampling method on recall rate and G-mean index of intelligent electric meter under each fault category
Figure BDA0003578221250000172
Figure BDA0003578221250000181
The embodiment of the invention and the mainstream deep learning generation method are shown in the table 5 of the experimental result pair ratio of F1-measure and macro-F1 indexes under each fault category of the intelligent electric meter, and are shown in the table 6 of the experimental result pair ratio of recall rate and G-mean index of each category. It can be seen that the intelligent electric meter fault classification method based on sample global rebalancing of the invention obtains F1-measure and recall rate which exceed those of other methods in most categories, and obtains the highest macro-F1 and G-mean. The results of table 5 and table 6 are combined to show that the balance effect of the intelligent electric meter fault sample is better than that of the deep learning generation method, and higher classification accuracy and recall rate can be obtained.
Table 5 experimental results of VAE-SLGAN and deep learning generation method on F1-measure and macro-F1 indexes under each fault category of intelligent electric meter
Figure BDA0003578221250000182
Table 6 experimental results of VAE-SLGAN and deep learning generation method on recall rate and G-mean index of intelligent electric meter under each fault category
Figure BDA0003578221250000191
Compared with a main flow sampling method and a deep learning sample generation method, a large number of comparison results show that the method can reliably balance samples when dealing with the problem of multi-classification of the intelligent electric meter caused by the fact that multiple distribution modes exist in classes and data among the classes have characteristic overlapping, provide characteristics beneficial to mining class differences for classifier learning, and effectively improve the accuracy and recall rate of classification of the fault electric meter.
In summary, the embodiments of the present invention have the following beneficial effects:
in the technical scheme, fault historical data of different types of intelligent electric meters are used as input data sets, and a plurality of secondary data sets are obtained through division; constructing a fusion model of VAE and GAN for each two-class data set, respectively taking each sample as the input of the model, and dividing the hidden codes into important feature codes and secondary feature codes; obtaining variant implicit codes of the samples by an implicit code reconstruction technology, and generating a plurality of reliable similar variant samples considering the important characteristics of input samples through the reduction of a decoder and the countermeasures of mutual information constraint and a discriminator; designing a characteristic repulsion technology acting between two types of sample implicit codes to carry out supervised characteristic representation learning; and superposing the reconstruction errors of all dimensions of the sample by a mixed coding technology to supplement the important feature codes, judging the classification result of the sample to be detected under each two-class data set according to the judgment result, and obtaining the fault category of the sample by hard voting.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (5)

1. A smart meter fault classification method based on sample global rebalancing is characterized by comprising the following steps:
(1) taking fault historical data of the intelligent electric meters in different categories as an input data set, and dividing to obtain a plurality of secondary data sets, specifically:
inputting an actual fault data set of the intelligent electric meter, wherein samples in the data set comprise 9 characteristic variables of working duration, a number of a lot to be delivered, a power supply unit number, a type of the electric energy meter, a fault identification month, an installation month, a province, an equipment specification, a communication mode and an equipment identifier; the fault category labels of the fault category labels comprise 11 types of appearance faults, metering faults, storage unit faults, processing unit faults, display unit faults, control unit faults, power supply unit faults, communication unit faults, clock unit faults, other faults and software faults; traversing each type of sample in the fault data set, taking all samples in the type as a few type of sample sets, taking all samples in other types as a majority type of sample set, and converting the original data set into 11 second type data sets; for each of the two types of data sets, the data set can be described as:
X=[Xmin,Xmaj]
wherein X is a class II data set, and X is defined as any sample in the data set, namely X belongs to X; xminFor a minority class sample set, define xminFor any sample in the dataset, i.e. xmin∈Xmin;XmajFor most sample sets, define xmajFor any sample in the data set, i.e. xmaj∈Xmaj
(2) Constructing a fusion model of VAE and GAN aiming at each two-class data set, respectively taking each sample as the input of the model, and dividing the hidden codes into important feature codes and secondary feature codes, wherein the method specifically comprises the following steps:
based on the unbalanced binary data set X ═ X obtained in step (1)min,Xmaj]Building a VAE/GAN model and training:
the VAE learns the implicit posterior distribution q (z | x) of the sample x by the encoder and the distribution p (x | z) of the sample x by the decoder as follows:
z~Enc(x)=q(z|x)
Figure FDA0003578221240000011
where Enc is the encoder, Dec is the decoder, z is the implicit code obtained by subjecting sample x to a VAE encoder,
Figure FDA0003578221240000021
reconstructed samples generated for VAE;
fusing a decoder of the VAE with a generator of the GAN to obtain a fusion model of the VAE and the GAN, wherein the optimization goal is as follows:
Figure FDA0003578221240000022
Figure FDA0003578221240000023
Figure FDA0003578221240000024
wherein alpha is1,α2Is hyperparametric, LpriorThe prior loss due to the difference in the distribution of q (z | x) and N (0, I), where N (0, I) is a normal distribution with a mean of 0 and a variance of 1, and DKL() Is KL divergence; l islikelihoodObtaining a likelihood loss, L, for p (x | z) for sampling q (z | x)GANIs the antagonism loss of GAN; z is a radical of formulaminIs xminLatent coding of (b), zmajIs xmajHidden coding of (2); e2]Dis is an identifier for the expected calculation function;
based on the optimization target, m-dimensional hidden codes z containing the characteristics of a given sample x can be obtained; the method is characterized in that hidden codes of a VAE model to a sample x are divided into important feature codes z in the training processKFAnd secondary feature coding zSFTwo parts; the important feature codes correspond to some important attributes of the samples, are high-level feature representations of the samples and are key features for determining sample categories, and the secondary feature codes are personalized attributes of the samples, generally have no generality, and the change of the secondary feature codes does not cause the change of the sample categories; namely: dividing the front m of z1Dimension zKFAfter m2Dimension is zSFAnd m is equal to m1+m2Sample xminIs steganographically coded zminIs shown as follows:
zmin=[zmin,KF,zmin,SF]
=Enc(xmin)
wherein z ismin,KFIs xminIs coded for the important feature, zmin,SFIs xminEncoding the secondary features of (1);
(3) obtaining variant hidden codes of the samples by a hidden code reconstruction technology, reducing by a decoder, restraining mutual information and confronting by a discriminator to generate a plurality of reliable similar variant samples considering the important characteristics of input samples, which specifically comprises the following steps:
based on the sample implicit codes obtained in the step (2) and the important feature codes and the secondary feature codes thereof, keeping the important feature codes unchanged, and randomly replacing the secondary feature codes to generate similar samples with the same characteristics as the given samples;
1) the decoder recovery process of the variant implicit code and the variant samples is calculated as follows:
Figure FDA0003578221240000031
Figure FDA0003578221240000032
wherein,
Figure FDA0003578221240000033
in order to obtain a variant-latent code,
Figure FDA0003578221240000034
is made of
Figure FDA0003578221240000035
Obtaining a variant sample;
Figure FDA0003578221240000036
sampling from its a priori normal distribution N (0, I);
in order to make the important feature codes correspond to some important attributes of the samples, a mutual information maximization constraint of a few types of samples and the important feature codes thereof needs to be introduced on the basis of the optimization target in the step (2), namely, the important feature codes of the samples can be reversely deduced through a mutual information reasoning model Q, and an optimization target L of the mutual informationInforenceThe following:
Figure FDA0003578221240000037
wherein, beta1Is a hyper-parameter, z ″)min,KFIs a variant sample
Figure FDA0003578221240000038
The important feature of (2) is coded in a hidden way, and the model Q is replaced by a coder Enc; z'min,KFFor reconstructing samples
Figure FDA0003578221240000039
Coding of important features by reverse reasoning, i.e.
Figure FDA00035782212400000310
z″min,KFIs a variant sample
Figure FDA00035782212400000311
Coding of important features by reverse reasoning, i.e.
Figure FDA00035782212400000312
2) Modifying the optimization objective L of GAN in step (2) in order to generate reliable variant samplesGANThe method comprises the following two parts:
Figure FDA00035782212400000313
Figure FDA00035782212400000314
wherein L isGenTo optimize the generator, LDisIs an optimization target of the discriminator; gamma ray1、γ2、γ3、γ4Is a hyper-parameter;
3) for each minority class sample x in the datasetminObtaining variant implicit codes by applying the process; after the traversal is finished, combining the obtained variant sample with the original minority sample to obtain a new minority sample set
Figure FDA00035782212400000315
Sampling until the number of the samples is consistent with that of most samples to obtain a balanced data set
Figure FDA00035782212400000316
(4) A characteristic repulsion technology acting between two types of sample implicit codes is designed for supervised characteristic representation learning, and the method specifically comprises the following steps:
based on the balance number obtained in step (3)Data set
Figure FDA0003578221240000041
Z for arbitrarily taking a batch of samples in a training cyclemaj,KFAnd Zmin,KFWherein Z ismaj,KFA feature set consisting of all significant feature steganographic codes of a plurality of types of samples in the training batch, Zmin,KFA feature set formed by the hidden codes of the important features of all the few classes of samples in the training batch; characteristic repulsive force Lfeature_forceIs calculated as follows:
dloss=Nearest_Neighbor_distance(Zmin,KF,Zmaj,KF,n)
Figure FDA0003578221240000042
Figure FDA0003578221240000043
wherein, the Nearest _ Neighbor _ distance () is a distance loss calculation function, and the function thereof is as follows: finding zmin,KFAt Zmaj,KFN nearest neighbors, n is 20, zmin,KF∈Zmin,KFAnd returning all Zmin,KFThe resulting set of average distances dloss;Lfeature_classDefining important feature code first dimension to represent class information of samples, wherein rho is a super parameter and is a feature repulsion label of a few types of samples, and-rho is a feature repulsion label of a plurality of types of samples; mu.s1And mu2Is a hyper-parameter;
(5) and superposing the reconstruction error of each dimension of the sample by a mixed coding technology to supplement the important feature codes, judging the classification result of the sample to be detected under each two-class data set according to the judgment result, and obtaining the fault category of the sample by hard voting, wherein the method specifically comprises the following steps of:
based on the cyclic training model of the steps (1) to (4), a reliable balance data set can be obtained
Figure FDA0003578221240000044
Combining important feature codes of the samples and reconstruction errors e of each dimension to serve as new sample features for distinguishing sample categories; for an input sample x, its structural features are denoted as F ═ zKF,e]E is calculated as follows:
Figure FDA0003578221240000045
inputting the F and the corresponding class label into a random forest classifier for training to obtain a single unbalanced two classifier; repeating the above process can obtain 11 random forest classifiers RFjJ is subscript of random forest classifier, j belongs to [1,11 ]](ii) a For the sample x to be measuredtestWhich predict the label
Figure FDA0003578221240000046
Is calculated as follows:
Figure FDA0003578221240000047
Figure FDA0003578221240000051
when j is a value, x is representedtestThe predicted failure category of (1) is a jth failure.
2. The method for classifying the faults of the smart meter based on the sample global rebalancing according to claim 1, wherein in the step (2), the structures of the encoder Enc, the decoder Dec and the discriminator Dis are as follows:
Figure FDA0003578221240000052
wherein Reshape () is a tensor warping function; conv1D () is a one-dimensional convolutional layer building function; flatten () is a tensor flattening function; dense () is the full connection layer building function; LeakyRelu and tanh represent the corresponding activation functions; dropout () is a random deactivation function.
3. The method for classifying faults of intelligent electric meters based on sample global rebalancing according to claim 1, wherein in the step (2), a1And alpha2Are respectively 0.1 and 1; m is a unit of1Is 5, m2Is 1.
4. The method for classifying faults of intelligent electric meters based on sample global rebalancing according to claim 1, wherein in the step (3), β is1Is 1; gamma ray1、γ2All values of (a) are 0.5, gamma3、γ4All values of (A) are 1.
5. The intelligent electric meter fault classification method based on the sample global rebalancing of claim 1, wherein in the step (4), the value of p is 0.4; mu.s1And mu2The values of (a) are 1 and 2, respectively.
CN202210348671.1A 2022-04-01 2022-04-01 Intelligent ammeter fault classification method based on sample global rebalancing Pending CN114781495A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210348671.1A CN114781495A (en) 2022-04-01 2022-04-01 Intelligent ammeter fault classification method based on sample global rebalancing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210348671.1A CN114781495A (en) 2022-04-01 2022-04-01 Intelligent ammeter fault classification method based on sample global rebalancing

Publications (1)

Publication Number Publication Date
CN114781495A true CN114781495A (en) 2022-07-22

Family

ID=82426386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210348671.1A Pending CN114781495A (en) 2022-04-01 2022-04-01 Intelligent ammeter fault classification method based on sample global rebalancing

Country Status (1)

Country Link
CN (1) CN114781495A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117092446A (en) * 2023-10-20 2023-11-21 国网山东省电力公司嘉祥县供电公司 Power transmission line fault detection method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117092446A (en) * 2023-10-20 2023-11-21 国网山东省电力公司嘉祥县供电公司 Power transmission line fault detection method and system

Similar Documents

Publication Publication Date Title
CN110609759B (en) Fault root cause analysis method and device
CN109002933B (en) Distribution line route variable relation model optimization method based on Relieff and t-SNE
CN108985380B (en) Point switch fault identification method based on cluster integration
CN105184394B (en) Optimal control method based on CPS online data mining of power distribution network
CN106845717A (en) A kind of energy efficiency evaluation method based on multi-model convergence strategy
CN111967512A (en) Abnormal electricity utilization detection method, system and storage medium
CN103455563A (en) Data mining method applicable to integrated monitoring system of intelligent substation
CN112559963A (en) Power distribution network dynamic parameter identification method and device
CN110378744A (en) Civil aviaton's frequent flight passenger value category method and system towards incomplete data system
CN107632590A (en) A kind of bottom event sort method based on priority
CN110297469A (en) The production line fault judgment method of Ensemble feature selection algorithm based on resampling
CN114781495A (en) Intelligent ammeter fault classification method based on sample global rebalancing
CN115373879A (en) Intelligent operation and maintenance disk fault prediction method for large-scale cloud data center
CN118244191B (en) Uninterrupted power supply's electric power collection metering system
CN102374936A (en) Mechanical failure diagnostic method based on complex immune network algorithm
CN113922412A (en) Panorama evaluation method and system for new energy multi-station short circuit ratio, storage medium and computing device
CN104899101A (en) Dynamic distributing method of software testing resources based on multi-object difference evolutionary algorithm
CN110348540A (en) Electrical power system transient angle stability Contingency screening method and device based on cluster
CN113887623A (en) IFCM-BB-based transformer fault diagnosis method
CN106056305A (en) Power generation system reliability rapid assessment method based on state clustering
CN116543198A (en) Smart electric meter fault classification method based on multi-granularity neighbor graphs
CN114298413B (en) Hydroelectric generating set runout trend prediction method
Lai et al. Missing value imputations by rule-based incomplete data fuzzy modeling
Shan et al. Root Cause Analysis of Failures for Power Communication Network Based on CNN
CN108898264B (en) Method and device for calculating quality metric index of overlapping community set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination