CN114781495A

CN114781495A - Intelligent ammeter fault classification method based on sample global rebalancing

Info

Publication number: CN114781495A
Application number: CN202210348671.1A
Authority: CN
Inventors: 贾欣; 高欣; 薛冰; 黄子健; 傅世元; 孟之航; 黄旭; 张光耀
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2022-04-01
Filing date: 2022-04-01
Publication date: 2022-07-22

Abstract

The embodiment of the invention provides a sample global rebalancing-based intelligent electric meter fault classification method, which comprises the following steps: dividing fault history data under different categories of the intelligent electric meters to obtain a plurality of second-class data sets; constructing a fusion model of VAE and GAN aiming at each class II data set, respectively taking each sample as the input of the model, and dividing the hidden codes of the samples into important feature codes and secondary feature codes; obtaining variant implicit codes of the samples by an implicit code reconstruction technology, and generating a plurality of reliable similar variant samples considering the important characteristics of input samples through the reduction of a decoder and the countermeasures of mutual information constraint and a discriminator; designing a characteristic repulsion technology acting between two types of sample implicit codes to carry out supervised characteristic representation learning; and superposing the reconstruction errors of all dimensions of the sample by a mixed coding technology to supplement the important feature codes, judging the classification result of the sample to be detected under each two-class data set according to the judgment result, and obtaining the fault category of the sample by hard voting.

Description

Intelligent ammeter fault classification method based on sample global rebalancing

[ technical field ] A

The invention relates to a fault classification method for an intelligent electric meter, in particular to a fault classification method for an intelligent electric meter based on sample global rebalancing.

[ background ] A method for producing a semiconductor device

Since the 21 st century, the intelligent electric meter is popularized and applied in the field of power utilization information collection on a large scale, the electric power industry enters a big data era, the intelligent electric meter is used as a terminal device under the current intelligent power grid construction in China, and the intelligent electric meter integrates multiple functions of metering, displaying, communication and the like, and plays an important supporting role in stable operation of a power grid. Along with the development of the smart power grid, the functions of the smart power meter are increasingly enriched, the fault types of the smart power meter are also increasingly increased, and the timely maintenance of the fault power meter has great significance for the stable operation of the power grid and the stable power utilization of users. However, because the operation and maintenance personnel often only have the maintenance experience for some specific faulty meters, when the smart meter fails, it is difficult to dispatch the maintenance personnel with the relevant maintenance experience to repair the meter without knowing the fault type of the meter. Therefore, accurate prediction of the fault type of the smart meter is crucial.

With the development of a smart power grid, the demand on a smart electric meter is increased day by day, smart electric meter manufacturers increase, design schemes, component types and process flows of different manufacturers are different, and a complex production environment and an application environment are added, so that the fault types of the smart electric meter are complex and various, the fault occurrence is influenced by the factors, and the finding of the mapping relation between the factors and the fault types is extremely complex, which is a typical machine learning problem. In addition, the occurrence frequency of different fault types of electric meters is different, fault data show multi-mode distribution characteristics, and therefore the prediction difficulty of the intelligent electric meter is increased. Learning directly based on unbalanced data can result in a shift of the classifier's prediction results to a majority class. The balance among classes can be realized by a data sampling means, but the method lacks a guarantee mechanism for the authenticity of a synthesized sample; the generated countermeasure network can improve the authenticity of the generated samples through the countermeasure game of the generator and the discriminator, but the problem of mode collapse exists, and the consistency of the distribution of a few types of samples before and after balance cannot be ensured. Based on the analysis, the invention provides a smart meter fault classification method based on sample global rebalancing so as to improve the performance of smart meter fault classification.

[ summary of the invention ]

In view of this, the invention provides a method for classifying faults of an intelligent electric meter based on sample global rebalancing, so as to improve the performance of fault classification of the intelligent electric meter.

The invention provides a sample global rebalancing-based intelligent electric meter fault classification method, which comprises the following steps of:

(1) taking fault historical data of the intelligent electric meters in different categories as an input data set, and dividing to obtain a plurality of secondary data sets, specifically:

inputting an actual fault data set of the intelligent electric meter, wherein samples in the data set comprise 9 characteristic variables of working duration, arrival batch number, power supply unit number, electric energy meter category, fault identification month, installation month, province, equipment specification, communication mode and equipment identification; the fault category labels of the system comprise 11 types of appearance faults, metering faults, storage unit faults, processing unit faults, display unit faults, control unit faults, power supply unit faults, communication unit faults, clock unit faults, other faults and software faults; traversing each type of sample in the fault data set, taking all samples in the type as a few type of sample sets, taking all samples in other types as a majority type of sample set, and converting the original data set into 11 second type data sets; for each of the two types of data sets, the data set can be described as:

X＝[X_min,X_maj]

wherein X is a class II data set, and X is defined as any sample in the data set, namely X belongs to X; x_minFor a minority class sample set, define x_minFor any sample in the dataset, i.e. x_min∈X_min；X_majFor most classes of sample sets, define x_majFor any sample in the data set, i.e. x_maj∈X_maj；

(2) Constructing a fusion model of VAE and GAN aiming at each two-class data set, respectively taking each sample as the input of the model, and dividing the hidden codes into important feature codes and secondary feature codes, wherein the method specifically comprises the following steps:

based on the unbalanced binary data set X ═ X obtained in the step (1)_min,X_maj]Building a VAE/GAN model and training:

the VAE learns the implicit posterior distribution q (z | x) of the sample x by the encoder and the distribution p (x | z) of the sample x by the decoder as follows:

z～Enc(x)＝q(z|x)

where Enc is the encoder, Dec is the decoder, z is the implicit code obtained by VAE encoding samples x,

reconstructed samples generated for VAE;

and fusing a decoder of the VAE with a generator of the GAN to obtain a fusion model of the VAE and the GAN, wherein the optimization goal is as follows:

wherein alpha is₁，α₂Is a root of Chao ShenNumber, L_priorThe prior loss due to the difference in the distribution of q (z | x) and N (0, I), where N (0, I) is a normal distribution with a mean of 0 and a variance of 1, and D_KL() Is KL divergence; l is a radical of an alcohol_likelihoodObtaining a likelihood loss, L, for p (x | z) for sampling q (z | x)_GANIs the antagonism loss of GAN; z is a radical of_minIs x_minIs implicitly coded, z_majIs x_majHidden coding of (2); e2]For the expected computation function, Dis is the discriminator;

based on the optimization target, m-dimensional hidden codes z containing the characteristics of a given sample x can be obtained; the method is characterized in that the hidden coding of a VAE model to a sample x is divided into important feature codes z in the training process_KFAnd secondary feature coding z_SFTwo parts; the important feature codes correspond to some important attributes of the samples, are high-level feature representations of the samples and are key features for determining sample categories, and the secondary feature codes are personalized attributes of the samples, generally have no generality, and the change of the secondary feature codes does not cause the change of the sample categories; namely: front m of division z₁Dimension z_KFAfter m₂Dimension is z_SFAnd m is equal to m₁+m₂Sample x_minIs steganographically coded z_minIs shown as follows:

z_min＝[z_min,KF,z_min,SF]

＝Enc(x_min)

wherein z is_min,KFIs x_minIs coded for the important feature, z_min,SFIs x_minEncoding the secondary features of (1);

(3) obtaining variant implicit codes of the samples by an implicit code reconstruction technology, and generating a plurality of reliable similar variant samples considering the important characteristics of input samples by the reduction of a decoder, mutual information constraint and the confrontation of a discriminator, wherein the method specifically comprises the following steps:

based on the sample implicit codes obtained in the step (2) and the important feature codes and the secondary feature codes thereof, keeping the important feature codes unchanged, and randomly replacing the secondary feature codes to generate similar samples with the same characteristics as the given sample;

1) the decoder recovery process of the variant implicit code and the variant samples is calculated as follows:

wherein,

in order to obtain a variation-latent code,

is composed of

Obtaining a variant sample;

sampling from its a priori normal distribution N (0, I);

in order to make the important feature codes correspond to some important attributes of the samples, a mutual information maximization constraint of a few types of samples and the important feature codes thereof needs to be introduced on the basis of the optimization target in the step (2), namely, the important feature codes of the samples can be reversely deduced through a mutual information reasoning model Q, and an optimization target L of the mutual information_InforenceThe following:

wherein beta is₁Is a hyperparameter, z ″)_min,KFIs a variant sample

The model Q is replaced by an encoder Enc; z'_min,KFFor reconstructing samples

Coding of important features by reverse reasoning, i.e.

z″_min,KFIs a variant sample

Coding of important features by reverse reasoning, i.e.

2) Modifying the optimization target L of GAN in step (2) in order to generate reliable variant samples_GANThe method comprises the following two parts:

wherein L is_GenTo optimize the goal of the generator, L_DisIs an optimization target of the discriminator; gamma ray₁、γ₂、γ₃、γ₄Is a hyper-parameter;

3) for each minority class sample x in the dataset_minObtaining variant latent codes by applying the process; after the traversal is finished, combining the obtained variant sample with the original minority sample to obtain a new minority sample set

Sampling until the number of the samples is consistent with that of most samples to obtain a balanced data set

(4) A characteristic repulsive force technology acting between two types of sample implicit codes is designed to carry out supervised characteristic representation learning, and the method specifically comprises the following steps:

based onThe balance data set obtained in the step (3)

Z for arbitrarily taking a batch of samples in a training cycle_maj,KFAnd Z_min,KFWherein Z is_maj,KFA feature set formed by latent coding of important features of all the samples of the plurality of classes in the training batch, Z_min,KFA feature set formed by the hidden codes of the important features of all the few classes of samples in the training batch; characteristic repulsive force L_{feature_force}Is calculated as follows:

d_loss＝Nearest_Neighbor_distance(Z_min,KF,Z_maj,KF,n)

wherein, the Nearest _ Neighbor _ distance () is a distance loss calculation function, and the function thereof is as follows: looking for z_min,KFAt Z_maj,KFN nearest neighbors in (1), wherein n is 20, z_min,KF∈Z_min,KFAnd return all Z_min,KFThe obtained average distance set d_loss；L_{feature_class}Defining important feature code first dimension to represent class information of samples, wherein rho is a super parameter and is a feature repulsion label of a few types of samples, and-rho is a feature repulsion label of a plurality of types of samples; mu.s₁And mu₂Is a hyper-parameter;

(5) and superposing the reconstruction error of each dimension of the sample by a mixed coding technology to supplement the important feature codes, judging the classification result of the sample to be detected under each two-class data set according to the judgment result, and obtaining the fault category of the sample by hard voting, wherein the method specifically comprises the following steps of:

based on the cyclic training model of the steps (1) to (4), a reliable balanced data set can be obtained

Combining important feature codes of the samples and reconstruction errors e of each dimension to serve as new sample features for distinguishing sample categories; for an input sample x, its structural features are denoted as F ═ z_KF,e]E is calculated as follows:

inputting the F and the corresponding class label into a random forest classifier for training to obtain a single unbalanced two classifier; repeating the above process can obtain 11 random forest classifiers RF_jJ is subscript of random forest classifier, j belongs to [1,11 ]](ii) a For the sample x to be measured_testWhich predicts the label

Is calculated as follows:

when j is a value, x is represented_testThe predicted failure category of (1) is a jth failure.

In the above method, in the step (2), the structures of the encoder Enc, the decoder Dec and the discriminator Dis are as follows:

where Reshape () is a tensor warping function; conv1D () is a one-dimensional convolutional layer building function; flatten () is a tensor flattening function; dense () is the full connection layer build function; LeakyRelu and tanh represent the corresponding activation functions; dropout () is a random deactivation function.

In the above method, the step (2)In alpha₁And alpha₂Are respectively 0.1 and 1; m is₁Is 5, m₂Is 1.

In the above process, in the step (3), β₁Is 1; gamma ray₁、γ₂All values of (a) are 0.5, gamma₃、γ₄The values of (A) are all 1.

In the method, in the step (4), ρ is 0.4; mu.s₁And mu₂The values of (a) are 1 and 2, respectively.

According to the intelligent electric meter fault classification method based on sample global rebalancing, the accuracy and the recall rate of intelligent electric meter fault classification are improved.

According to the technical scheme, the invention has the following beneficial effects:

in the technical scheme implemented by the invention, the sample generation aiming at the specified characteristics is realized through redefining the VAE hidden code, the problems of data authenticity and overfitting caused by a data oversampling algorithm are solved, and the problem of mode running existing in the conventional sample generation algorithm is improved; the two modes of designing a characteristic repulsion technology and a mixed coding technology relieve the problem of difficult characteristic extraction and classification of samples in an overlapping area. The method can improve the robustness of the model, so that the accuracy and the recall rate of fault classification of the intelligent electric meter are improved.

[ description of the drawings ]

In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor.

FIG. 1 is a schematic flow chart of a framework of a fault classification method for an intelligent electric meter based on sample global rebalancing, which is provided by the invention;

FIG. 2 is a schematic flow chart of a fault classification of the smart meter;

fig. 3 is a detailed schematic of the algorithm of the present invention.

[ detailed description ] A

In order to better understand the technical scheme of the invention, the invention is described in detail below with reference to the accompanying drawings.

It should be understood that the described embodiments of the invention are only some of the embodiments of the invention, and not all of them. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a method for classifying faults of an intelligent ammeter based on sample global rebalancing. In order to meet the fault classification of the intelligent electric meter, the invention disassembles the complex unbalanced multi-classification problem into a plurality of unbalanced two-classification problems, designs a hidden coding reconstruction technology to realize the generation of reliable variation samples considering the key characteristics of given samples, and obtains a plurality of balanced two-class data sets; the two modes of a characteristic repulsion technology and a mixed coding technology are designed to relieve the problem of difficult characteristic extraction and classification of samples in an overlapping area; and finally, integrating the classification results of the samples to be detected under each two-class data set, and obtaining the fault classes of the samples to be detected in a hard voting mode.

Fig. 1 is a schematic flow chart of a framework of a method for classifying faults of an intelligent electric meter based on sample global rebalancing, which includes the following steps:

step 101, taking fault history data of different types of smart electric meters as input data sets, and dividing the input data sets to obtain a plurality of two types of data sets, specifically:

inputting an actual fault data set of the intelligent electric meter, wherein samples in the data set comprise 9 characteristic variables of working duration, a number of a lot to be delivered, a power supply unit number, a type of the electric energy meter, a fault identification month, an installation month, a province, an equipment specification, a communication mode and an equipment identifier; the fault category labels of the fault category labels comprise 11 types of appearance faults, metering faults, storage unit faults, processing unit faults, display unit faults, control unit faults, power supply unit faults, communication unit faults, clock unit faults, other faults and software faults; traversing each type of sample in the fault data set, taking all samples in the type as a few type sample set, taking all samples in the other types as a majority type sample set, and converting the original data set into 11 second type data sets; for each of the two types of data sets, the data set can be described as:

X＝[X_min,X_maj]

wherein X is a class II data set, and X is defined as any sample in the data set, namely X belongs to X; x_minFor a minority class sample set, define x_minFor any sample in the dataset, i.e. x_min∈X_min；X_majFor most classes of sample sets, define x_majFor any sample in the dataset, i.e. x_maj∈X_maj。

Step 102, constructing a fusion model of the VAE and the GAN for each class II data set, respectively taking each sample as the input of the model, and dividing the hidden code of the model into an important feature code and a secondary feature code, specifically:

based on the unbalanced binary data set X ═ X obtained in step 101_min,X_maj]Building a VAE/GAN model and training:

the VAE learns the implicit variable posterior distribution q (z | x) of the sample x by the encoder and the distribution p (x | z) of the sample x by the decoder as follows:

z～Enc(x)＝q(z|x)

where Enc is the encoder, Dec is the decoder, z is the implicit code obtained by subjecting sample x to a VAE encoder,

reconstructed samples generated for VAEs;

wherein alpha is₁，α₂Is hyperparametric, L_priorThe prior loss due to the difference in the distribution of q (z | x) and N (0, I), where N (0, I) is a normal distribution with a mean of 0 and a variance of 1, and D_KL() Is KL divergence; l is a radical of an alcohol_likelihood sample q (z | x) to yield a likelihood loss of p (x | z), L_GANIs the antagonism loss of GAN; z is a radical of_minIs x_minLatent coding of (b), z_majIs x_majThe hidden coding of (2); e [ 2 ]]Dis is an identifier for the expected calculation function;

based on the optimization target, m-dimensional hidden codes z containing the characteristics of a given sample x can be obtained; the method is characterized in that the hidden coding of a VAE model to a sample x is divided into important feature codes z in the training process_KFAnd secondary feature coding z_SFTwo parts; the important feature codes correspond to some important attributes of the samples, are high-level feature representations of the important features and are key features for determining sample categories, and the secondary feature codes are personalized attributes of the samples, generally have no generality, and the change of the secondary feature codes does not cause the change of the sample categories; namely: front m of division z₁Dimension is z_KFAfter m is₂Dimension z_SFAnd m is equal to m₁+m₂Sample x_minIs steganographically coded z_minIs shown as follows:

z_min＝[z_min,KF,z_min,SF]

＝Enc(x_min)

wherein z is_min,KFIs x_minCoding of the important features of (2), z_min,SFIs x_minEncoding the secondary features of (3).

103, obtaining a variant implicit code of the input sample by an implicit code reconstruction technology, and generating a plurality of reliable similar variant samples considering the important characteristics of the input sample by the reduction of a decoder and the mutual information constraint and the countermeasure of a discriminator, wherein the method specifically comprises the following steps:

based on the sample implicit codes obtained in the step 102 and the important feature codes and the secondary feature codes thereof, keeping the important feature codes unchanged, and randomly replacing the secondary feature codes to generate similar samples with the same characteristics as the given sample;

1) the decoder recovery process of the variant implicit coding and the variant samples is calculated as follows:

wherein,

in order to obtain a variant-latent code,

is composed of

Obtaining a variant sample;

sampling from its a priori normal distribution N (0, I);

in order to make the important feature codes correspond to some important attributes of the samples, a mutual information maximization constraint of a few types of samples and the important feature codes thereof needs to be introduced on the basis of the optimization target in the step (2), namely, the important feature codes of the samples can be reversely deduced through a mutual information reasoning model Q, and an optimization target L of the mutual information_InfConference is as follows:

wherein beta is₁Is a hyper-parameter, z ″)_min,KFIs a variant sample

The important feature of (2) is coded in a hidden way, and the model Q is replaced by a coder Enc; z'_min,KFFor reconstructing samples

Coding of important features by reverse reasoning, i.e.

z″_min,KFIs a variant sample

Coding of important features by reverse reasoning, i.e.

2) Modifying the optimization objective L of GAN in step (2) in order to generate reliable variant samples_GANThe method comprises the following two parts:

wherein L is_GenTo optimize the generator, L_DisAn optimization objective for the arbiter; gamma ray₁、γ₂、γ₃、γ₄Is a hyper-parameter;

Step 104, designing a characteristic repulsion technology acting between two types of sample hidden codes for supervised characteristic representation learning, specifically:

based on the balanced data set obtained in step 103

Z for arbitrarily taking a batch sample in a training cycle_maj,KFAnd Z_min,KFWherein Z is_maj,KFA feature set formed by latent coding of important features of all the samples of the plurality of classes in the training batch, Z_min,KFA feature set formed by the hidden coding of the important features of all the few classes of samples in the training batch; characteristic repulsive force L_{feature_force}Is calculated as follows:

d_loss＝Nearest_Neighbor_distance(Z_min,KF,Z_maj,KF,n)

wherein, the Nearest _ Neighbor _ distance () is a distance loss calculation function, and the function thereof is as follows: finding z_min,KFAt Z_maj,KFN nearest neighbors, n is 20, z_min,KF∈Z_min,KFAnd returning all Z_min,KFThe resulting set of average distances d_loss；L_{feature_class}Is the class characteristic repulsion loss, defines the class information of the important characteristic code first dimension representation sample, rho is a hyperparameter, is the characteristic repulsion label of the minority class sample- ρ is the characteristic repulsive force signature of most classes of samples; mu.s₁And mu₂Is a hyper-parameter.

105, superposing the reconstruction error of each dimension of the sample as the supplement of the important feature code by using a mixed coding technology, judging the classification result of the sample to be detected under each two-class data set according to the supplement, and obtaining the fault category of the sample by hard voting, wherein the steps are as follows:

based on the cyclic training model from step 101 to step 104, a reliable balanced data set can be obtained

Combining important feature codes of the samples and reconstruction errors e of each dimension to serve as new sample features for distinguishing sample categories; for an input sample x, its structural features are denoted as F ═ z_KF,e]And e is calculated as follows:

Is calculated as follows:

FIG. 2 is a schematic flow chart of the method for solving the problem of fault classification of the intelligent electric meters, and the method comprises the steps of firstly dividing fault history data sets of the intelligent electric meters in different classes to obtain a plurality of two-class data sets, respectively applying the VAE-SLGAN to balance sample sets to obtain a plurality of balanced two-class data sets, and respectively training a random forest two-classifier; and for the sample to be detected, obtaining classifiers for discrimination by using the two types of data sets respectively, and obtaining the final prediction category by a soft voting method.

FIG. 3 is a detailed schematic diagram of the algorithm of the present invention, which proposes a steganographic reconstruction technique based on a fusion model of VAE and GAN, for solving the problem of mode collapse occurring in the sample generation process; a feature repulsion technology and a hybrid coding technology are designed to solve the problems of difficulty in feature mining and classification of overlapped regions.

Algorithm 1 is a pseudo code of the generated number calculation method for each minority class of samples:

algorithm 2 is an algorithm pseudo code of the sample global generation method:

algorithm 3 is an algorithm pseudo code for feature repulsion calculation:

in the specific embodiment, the test is carried out by using fault history data sets of the intelligent electric meters under different types, and the data sets collect intelligent electric meter data of 11 fault types in 25 provinces in total. Due to the influence of various factors such as manual statistics and external conditions, the data labels have the conditions of wrong label, missing label and the like, and 15885 pieces of fault sample data in total are obtained after data cleaning is carried out through various technologies such as cluster analysis, missing value completion, abnormal value processing and the like. To reduce the randomness of the results, the data set was randomly divided into a training set and a test set using a ratio of 8:2 with fixed random numbers.

TABLE 1 data set used in the specific examples

In order to verify the effectiveness of the proposed algorithm, 8 mainstream oversampling methods and 5 mainstream deep learning sample generation methods are used for comparison in the embodiment of the present invention, as shown in table 2. Considering the characteristics of large data volume and complex category of the sample of the intelligent electric meter fault data set, RF is used as a classifier to verify the sample balance effect. The inventive examples are represented in the tables by VAE-SLGAN.

TABLE 2 algorithm for comparison in the specific examples

The embodiment of the invention uses macro-F1 and G-mean indexes to evaluate the classification effect of the algorithm. The macro-F1 index is the arithmetic mean of F1-measure of each category and is used for comprehensively evaluating the accuracy and the recall rate of the model on each category. The G-mean index is a geometric mean value of the recall rate of each category and is used for evaluating the recall condition of the model on each category. The values of Macro-F1 and G-mean are from 0 to 1, and the larger the value is, the better the classification performance of the model is.

The embodiment of the invention and the main flow sampling method are shown in the table 3 for the experimental result pairs on the F1-measure and macro-F1 indexes under each fault category of the intelligent electric meter, and in the table 4 for the experimental result pairs on the recall ratio and the G-mean index of each category. It can be seen that the intelligent electric meter fault classification method based on sample global rebalancing of the invention obtains F1-measure and recall rate which exceed those of other methods in most categories, and obtains the highest macro-F1 and G-mean. The results of table 3 and table 4 are combined to show that the balance effect of the fault sample of the intelligent ammeter is better than that of the oversampling method, and higher classification accuracy and recall rate can be obtained.

Table 3 experimental results of VAE-SLGAN and oversampling method on F1-measure and macro-F1 indexes under each fault category of intelligent electric meter

Table 4 experimental results of VAE-SLGAN and oversampling method on recall rate and G-mean index of intelligent electric meter under each fault category

The embodiment of the invention and the mainstream deep learning generation method are shown in the table 5 of the experimental result pair ratio of F1-measure and macro-F1 indexes under each fault category of the intelligent electric meter, and are shown in the table 6 of the experimental result pair ratio of recall rate and G-mean index of each category. It can be seen that the intelligent electric meter fault classification method based on sample global rebalancing of the invention obtains F1-measure and recall rate which exceed those of other methods in most categories, and obtains the highest macro-F1 and G-mean. The results of table 5 and table 6 are combined to show that the balance effect of the intelligent electric meter fault sample is better than that of the deep learning generation method, and higher classification accuracy and recall rate can be obtained.

Table 5 experimental results of VAE-SLGAN and deep learning generation method on F1-measure and macro-F1 indexes under each fault category of intelligent electric meter

Table 6 experimental results of VAE-SLGAN and deep learning generation method on recall rate and G-mean index of intelligent electric meter under each fault category

Compared with a main flow sampling method and a deep learning sample generation method, a large number of comparison results show that the method can reliably balance samples when dealing with the problem of multi-classification of the intelligent electric meter caused by the fact that multiple distribution modes exist in classes and data among the classes have characteristic overlapping, provide characteristics beneficial to mining class differences for classifier learning, and effectively improve the accuracy and recall rate of classification of the fault electric meter.

In summary, the embodiments of the present invention have the following beneficial effects:

in the technical scheme, fault historical data of different types of intelligent electric meters are used as input data sets, and a plurality of secondary data sets are obtained through division; constructing a fusion model of VAE and GAN for each two-class data set, respectively taking each sample as the input of the model, and dividing the hidden codes into important feature codes and secondary feature codes; obtaining variant implicit codes of the samples by an implicit code reconstruction technology, and generating a plurality of reliable similar variant samples considering the important characteristics of input samples through the reduction of a decoder and the countermeasures of mutual information constraint and a discriminator; designing a characteristic repulsion technology acting between two types of sample implicit codes to carry out supervised characteristic representation learning; and superposing the reconstruction errors of all dimensions of the sample by a mixed coding technology to supplement the important feature codes, judging the classification result of the sample to be detected under each two-class data set according to the judgment result, and obtaining the fault category of the sample by hard voting.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A smart meter fault classification method based on sample global rebalancing is characterized by comprising the following steps:

inputting an actual fault data set of the intelligent electric meter, wherein samples in the data set comprise 9 characteristic variables of working duration, a number of a lot to be delivered, a power supply unit number, a type of the electric energy meter, a fault identification month, an installation month, a province, an equipment specification, a communication mode and an equipment identifier; the fault category labels of the fault category labels comprise 11 types of appearance faults, metering faults, storage unit faults, processing unit faults, display unit faults, control unit faults, power supply unit faults, communication unit faults, clock unit faults, other faults and software faults; traversing each type of sample in the fault data set, taking all samples in the type as a few type of sample sets, taking all samples in other types as a majority type of sample set, and converting the original data set into 11 second type data sets; for each of the two types of data sets, the data set can be described as:

X＝[X_min,X_maj]

wherein X is a class II data set, and X is defined as any sample in the data set, namely X belongs to X; x_minFor a minority class sample set, define x_minFor any sample in the dataset, i.e. x_min∈X_min；X_majFor most sample sets, define x_majFor any sample in the data set, i.e. x_maj∈X_maj；

based on the unbalanced binary data set X ═ X obtained in step (1)_min,X_maj]Building a VAE/GAN model and training:

z～Enc(x)＝q(z|x)

reconstructed samples generated for VAE;

fusing a decoder of the VAE with a generator of the GAN to obtain a fusion model of the VAE and the GAN, wherein the optimization goal is as follows:

wherein alpha is₁，α₂Is hyperparametric, L_priorThe prior loss due to the difference in the distribution of q (z | x) and N (0, I), where N (0, I) is a normal distribution with a mean of 0 and a variance of 1, and D_KL() Is KL divergence; l is_likelihoodObtaining a likelihood loss, L, for p (x | z) for sampling q (z | x)_GANIs the antagonism loss of GAN; z is a radical of formula_minIs x_minLatent coding of (b), z_majIs x_majHidden coding of (2); e2]Dis is an identifier for the expected calculation function;

based on the optimization target, m-dimensional hidden codes z containing the characteristics of a given sample x can be obtained; the method is characterized in that hidden codes of a VAE model to a sample x are divided into important feature codes z in the training process_KFAnd secondary feature coding z_SFTwo parts; the important feature codes correspond to some important attributes of the samples, are high-level feature representations of the samples and are key features for determining sample categories, and the secondary feature codes are personalized attributes of the samples, generally have no generality, and the change of the secondary feature codes does not cause the change of the sample categories; namely: dividing the front m of z₁Dimension z_KFAfter m₂Dimension is z_SFAnd m is equal to m₁+m₂Sample x_minIs steganographically coded z_minIs shown as follows:

z_min＝[z_min,KF,z_min,SF]

＝Enc(x_min)

(3) obtaining variant hidden codes of the samples by a hidden code reconstruction technology, reducing by a decoder, restraining mutual information and confronting by a discriminator to generate a plurality of reliable similar variant samples considering the important characteristics of input samples, which specifically comprises the following steps:

based on the sample implicit codes obtained in the step (2) and the important feature codes and the secondary feature codes thereof, keeping the important feature codes unchanged, and randomly replacing the secondary feature codes to generate similar samples with the same characteristics as the given samples;

wherein,

in order to obtain a variant-latent code,

is made of

Obtaining a variant sample;

sampling from its a priori normal distribution N (0, I);

wherein, beta₁Is a hyper-parameter, z ″)_min,KFIs a variant sample

Coding of important features by reverse reasoning, i.e.

z″_min,KFIs a variant sample

Coding of important features by reverse reasoning, i.e.

wherein L is_GenTo optimize the generator, L_DisIs an optimization target of the discriminator; gamma ray₁、γ₂、γ₃、γ₄Is a hyper-parameter;

3) for each minority class sample x in the dataset_minObtaining variant implicit codes by applying the process; after the traversal is finished, combining the obtained variant sample with the original minority sample to obtain a new minority sample set

(4) A characteristic repulsion technology acting between two types of sample implicit codes is designed for supervised characteristic representation learning, and the method specifically comprises the following steps:

based on the balance number obtained in step (3)Data set

Z for arbitrarily taking a batch of samples in a training cycle_maj,KFAnd Z_min,KFWherein Z is_maj,KFA feature set consisting of all significant feature steganographic codes of a plurality of types of samples in the training batch, Z_min,KFA feature set formed by the hidden codes of the important features of all the few classes of samples in the training batch; characteristic repulsive force L_{feature_force}Is calculated as follows:

d_loss＝Nearest_Neighbor_distance(Z_min,KF,Z_maj,KF,n)

wherein, the Nearest _ Neighbor _ distance () is a distance loss calculation function, and the function thereof is as follows: finding z_min,KFAt Z_maj,KFN nearest neighbors, n is 20, z_min,KF∈Z_min,KFAnd returning all Z_min,KFThe resulting set of average distances d_loss；L_{feature_class}Defining important feature code first dimension to represent class information of samples, wherein rho is a super parameter and is a feature repulsion label of a few types of samples, and-rho is a feature repulsion label of a plurality of types of samples; mu.s₁And mu₂Is a hyper-parameter;

based on the cyclic training model of the steps (1) to (4), a reliable balance data set can be obtained

inputting the F and the corresponding class label into a random forest classifier for training to obtain a single unbalanced two classifier; repeating the above process can obtain 11 random forest classifiers RF_jJ is subscript of random forest classifier, j belongs to [1,11 ]](ii) a For the sample x to be measured_testWhich predict the label

Is calculated as follows:

2. The method for classifying the faults of the smart meter based on the sample global rebalancing according to claim 1, wherein in the step (2), the structures of the encoder Enc, the decoder Dec and the discriminator Dis are as follows:

wherein Reshape () is a tensor warping function; conv1D () is a one-dimensional convolutional layer building function; flatten () is a tensor flattening function; dense () is the full connection layer building function; LeakyRelu and tanh represent the corresponding activation functions; dropout () is a random deactivation function.

3. The method for classifying faults of intelligent electric meters based on sample global rebalancing according to claim 1, wherein in the step (2), a₁And alpha₂Are respectively 0.1 and 1; m is a unit of₁Is 5, m₂Is 1.

4. The method for classifying faults of intelligent electric meters based on sample global rebalancing according to claim 1, wherein in the step (3), β is₁Is 1; gamma ray₁、γ₂All values of (a) are 0.5, gamma₃、γ₄All values of (A) are 1.

5. The intelligent electric meter fault classification method based on the sample global rebalancing of claim 1, wherein in the step (4), the value of p is 0.4; mu.s₁And mu₂The values of (a) are 1 and 2, respectively.