CN113792825B

CN113792825B - Fault classification model training method and device for electricity information acquisition equipment

Info

Publication number: CN113792825B
Application number: CN202111358733.9A
Authority: CN
Inventors: 李悦; 周玉; 杜新纲; 葛得辉; 高凡; 黄奇峰; 邵雪松; 易永仙; 王舒; 支亚薇
Original assignee: State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Current assignee: State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Priority date: 2021-11-17
Filing date: 2021-11-17
Publication date: 2022-08-02
Anticipated expiration: 2041-11-17
Also published as: CN113792825A

Abstract

The application provides a method and a device for training a fault classification model of power consumption information acquisition equipment, wherein the method comprises the following steps: performing data processing on the fault information sample set to obtain a training set; inputting the training set into a plurality of classifiers for training to obtain the recall rates of different fault types of the plurality of classifiers; and according to the recall rate, performing weighted fusion on the plurality of classifiers to obtain a target fault classification model. The target fault classification model is constructed by training a plurality of classifiers and performing weighted fusion by using the recall rate, and the prediction classification of the unknown class label fault data is performed according to the target fault classification model, so that the advantages of different single classifiers are comprehensively embodied, and the accuracy and stability of the training model for effectively predicting the fault classification are improved.

Description

Fault classification model training method and device for electricity information acquisition equipment

Technical Field

The application relates to the technical field of machine learning classification, in particular to a method and a device for training a fault classification model of power consumption information acquisition equipment.

Background

With the rapid development of smart power grids, smart electric energy meters have become the most important component devices of power consumption information acquisition systems as devices for measuring electric power. Nowadays, the coverage rate of the construction of the power utilization information acquisition system is higher and higher, and the diversification of the fault types and the continuous rising of the fault frequency of the intelligent electric energy meter bring new challenges to operation and maintenance work. The traditional method for fault diagnosis is to find out users with abnormal operation and suspected abnormal power consumption of the intelligent electric energy meter through a continuous difference algorithm according to collected current, voltage and user power information.

Disclosure of Invention

In view of this, an object of the present application is to provide a method and an apparatus for training a fault classification model of power consumption information collection equipment. By training the weighted fusion model after the problem of unbalanced fault data categories is solved, the accuracy and stability of the classification model for effectively predicting the fault are improved.

In a first aspect, an embodiment of the present application provides a method for training a fault classification model of power consumption information acquisition equipment, including: performing data processing on the fault information sample set to obtain a training set; inputting the training set into a plurality of classifiers for training to obtain recall rates of different fault types of the plurality of classifiers; and performing weighted fusion on the plurality of classifiers according to the recall rate to obtain a target fault classification model.

In the implementation process, a series of data processing is performed through the acquired fault information sample data, the processed data is input into a plurality of classifiers for training, a weighted fusion classification model established based on the plurality of classifiers is trained, and subsequent fault prediction is performed through the weighted fusion classification model. The target fault classification model is constructed in a voting mode of class accuracy weighting, results of different single classifiers with different classification accuracy of different types of fault information can be comprehensively considered, advantages of the different single classifiers can be embodied, and therefore accuracy and stability of effective classification and prediction of the training model are further improved.

With reference to the first aspect, an embodiment of the present application provides a first possible implementation manner of the first aspect, where: the processing of data in the fault information sample set to obtain a training set includes: performing data preprocessing on the fault information sample set to obtain an initial training set; and adjusting the unbalance proportion of the initial training set to obtain the training set.

In the implementation process, the acquired fault information data sample set is used as an original data sample set, and a series of data processing operations such as preprocessing, uneven and classification processing and the like are performed on the original data sample set before the classification algorithm starts to train, so that on one hand, the situation that the classification model cannot work due to data can be prevented, on the other hand, the training of the classification model can be accelerated, the accuracy of the algorithm is improved, and the adaptability of the classification algorithm model to an unbalanced data set is improved.

With reference to the first possible implementation manner of the first aspect, an embodiment of the present application provides a second possible implementation manner of the first aspect, where: the data preprocessing is performed on the fault information sample set to obtain an initial training set, and the method comprises the following steps: identifying the fault information sample set to determine a missing sample; processing the missing samples to obtain an initial training set; or, carrying out standardization processing on the fault information sample set to obtain an initial training set; or, performing correlation analysis on the fault information sample set to determine the correlation of each sample in the fault information sample set, and screening the samples according to the correlation of each sample in the fault information sample set to obtain an initial training set.

In the implementation process, in the process of preprocessing the acquired fault information data sample set: based on the complexity of the original data sample set, the missing data in the original data sample set can be processed in a related manner, the influence of the missing data is eliminated, and the situation that a subsequent training classification model cannot work normally due to the missing data is prevented; based on the complexity of an original data sample set, the original data sample set can be standardized, namely mean variance normalization dimensionless processing is carried out, different attribute characteristics are scaled to the same numerical value interval, different indexes can be compared with each other, and the influence of the attribute characteristics on a distance-based classifier such as a K nearest neighbor method classifier can be reduced; based on the complexity of the original data sample set, correlation analysis can be performed on the original data sample set, attribute features and redundant features with weak correlation with fault types are eliminated, and data attributes most beneficial to construction of a classification model are selected, so that training for constructing the classification model can be accelerated, and the precision of a weighted fusion classification model is improved.

With reference to the first possible implementation manner of the first aspect, an embodiment of the present application provides a third possible implementation manner of the first aspect, where: adjusting the imbalance ratio of the initial training set to obtain the training set, including: performing oversampling processing on a few types of samples of the initial training set to obtain an oversampling initial training set; and carrying out undersampling treatment on the oversampled initial training set, and eliminating the noise value of the oversampled initial training set to obtain the training set for eliminating unbalanced data.

In the implementation process, the method for adjusting the unbalance proportion of the preprocessed data set by the mixed sampling mode of combining oversampling and undersampling avoids the defects existing in the process of independently using oversampling or undersampling data, namely, the method can prevent most types of sample data from being lost and few types of samples from generating overfitting, and the method can achieve a new balance state of the distribution of the most types of samples and the few types of samples by reconstructing the sample data set, so that the effect is better.

With reference to the first possible implementation manner of the first aspect, an embodiment of the present application provides a fourth possible implementation manner of the first aspect, where: the classifiers are one or more of K nearest neighbor classifiers, decision tree classifiers, support vector machine classifiers, Bayesian classifiers or random forest classifiers in any combination.

In the implementation process, training is carried out based on several common classifiers, a final target fault classification model is obtained through training after weighted fusion, and results of different single classifiers for different fault information types with different classification accuracy rates are comprehensively considered. The unique advantages of different single classifiers are utilized, the combination effect of each single classifier is comprehensively compared, a target fault classification model which is more stable and reliable than the single classifiers is constructed, and the accuracy of classification prediction is integrally improved.

In a second aspect, an embodiment of the present application provides a method for determining a fault type of power consumption information collecting equipment, including: acquiring to-be-identified power utilization fault information data; and inputting the power utilization fault information data to be identified into a target fault classification model determined by the power utilization information acquisition equipment fault classification model training method for identification so as to determine the fault type of the power utilization fault information data to be identified.

In the implementation process, the classification model for judging and determining the fault type is obtained through training and weighting fusion by the power consumption information acquisition equipment fault classification model training method, namely the classification model is a required target fault classification model. The power utilization fault information data to be identified are input into the constructed target fault classification model, and the classification model is used for predicting the fault type, so that the type of the power utilization fault information data fault can be identified more accurately, the faults of the power utilization information acquisition equipment can be classified and predicted accurately, operation and maintenance personnel can be guided to rapidly check the fault type, the fault reason can be analyzed, the fault prediction can be carried out on the equipment in the same area and in the same batch, and the human resources and the time cost are reduced.

In a third aspect, an embodiment of the present application further provides a power consumption information collecting device fault classification model training device, including: the processing module is used for carrying out data processing on the fault information sample set to obtain a training set; the training module is used for inputting the training set into a plurality of classifiers for training to obtain the recall rates of the plurality of classifiers in different fault types; and the fusion module is used for performing weighted fusion on the classifiers according to the recall rate so as to obtain a target fault classification model.

In the implementation process, the power consumption information acquisition equipment fault classification model training device comprises a processing module, a training module, a fusion module and the like. The processing module can perform data processing on the fault information sample set to obtain a training set; the training module can input the training set into a plurality of classifiers for training to obtain the recall rates of the plurality of classifiers in different fault types; and the fusion module can perform weighted fusion on the plurality of classifiers according to the recall rate so as to obtain a target fault classification model.

In a fourth aspect, an embodiment of the present application further provides a device for classifying faults of power consumption information collection equipment, including: the acquisition module is used for acquiring the power utilization fault information data to be identified; and the classification module is used for inputting the power utilization fault information data to be identified into the target fault classification model determined by the power utilization information acquisition equipment fault classification model training method for identification so as to determine the fault type of the power utilization fault information data to be identified.

In the implementation process, the power consumption information acquisition equipment fault classification device comprises an acquisition module, a classification module and the like. The acquisition module can acquire the power utilization fault information data to be identified; and the classification module can input the power consumption fault information data to be identified into the target fault classification model determined by any power consumption information acquisition equipment fault classification model training method for identification so as to determine the fault type of the power consumption fault information data to be identified.

In a fifth aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory storing machine-readable instructions executable by the processor, the machine-readable instructions, when executed by the processor, performing the steps of the method of the first aspect described above, or any possible implementation of the first aspect, when the electronic device is run.

In a sixth aspect, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps of the method for training the fault classification model of the power consumption information collection device and the method for determining the fault type of the power consumption information collection device in the first aspect, or any possible implementation manner of the first aspect.

The embodiment of the application provides a method and a device for training a fault classification model of power consumption information acquisition equipment, a method and a device for determining the fault type of the power consumption information acquisition equipment, electronic equipment and a computer readable storage medium. The method is characterized in that a plurality of classifiers are trained, weighted fusion is carried out by using the recall rate, a target fault classification model is constructed, compared with the training of a single classifier in the prior art, the model classification prediction has uncertainty and low accuracy, the method trains a plurality of single classifiers, the recall rate is used as the weight of the target fault classification model in weighted fusion, and the model is trained in a voting mode based on class accuracy weighting, so that the advantages of different single classifiers can be embodied, and the accuracy and stability of effective classification and prediction of the training model are further improved. The power utilization fault information data to be identified are input into the constructed target fault classification model, the fault type is predicted by using the weighted fusion classification model, the type of the power utilization fault information data fault can be accurately identified, and therefore the fault of the power utilization information acquisition equipment can be accurately classified and predicted.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a block diagram of an electronic device according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a power consumption information acquisition device fault classification model training method according to an embodiment of the present application;

FIG. 3 is a flow chart of solving recall provided by an embodiment of the present application;

FIG. 4 is a flowchart of a method for training a target fault classification model according to an embodiment of the present disclosure;

fig. 5 is a functional module schematic diagram of a power consumption information acquisition device fault classification model training device according to an embodiment of the present application;

fig. 6 is a flowchart of a method for determining a fault model of power consumption information acquisition equipment according to an embodiment of the present application;

fig. 7 is a functional module schematic diagram of a fault classification device for power consumption information collection equipment according to an embodiment of the present application.

Icon: 100-an electronic device; 111-a memory; 112-a memory controller; 113-a processor; 114-a peripheral interface; 115-input-output unit; 116-a display unit; 310-a processing module; 320-a training module; 330-a fusion module; 510-an obtaining module; 520-Classification Module.

Detailed Description

The technical solution in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

The inventor of the application notices that if the faults of the power consumption information acquisition equipment can be accurately classified and predicted, operation and maintenance personnel can be guided to rapidly troubleshoot the fault types, fault reasons are analyzed, and power consumption safety and stability are ensured. With the development of big data technology, monitoring is carried out by applying a machine learning classification method, so that the human resource and time cost can be greatly reduced.

At present, in the aspect of fault category diagnosis for intelligent electric energy meters, a convolutional neural network classifier is used for judging fault categories, and a large amount of sample data is required for training to improve accuracy. The improved particle swarm algorithm and the support vector machine classification model are used, the model needs to train a plurality of two classifiers to search for optimal parameters, and the training time is long. Because the classification accuracy rate deviation of different fault categories under a single classifier is large, effective classification prediction cannot be guaranteed, and the classification prediction has uncertainty. For fault information data with unbalanced classes, training efficiency and model classification accuracy of the classifier are generally low.

Based on the research, the embodiment of the application provides a method and a device for training a fault classification model of power consumption information acquisition equipment. The target model can be trained in a voting mode based on class accuracy weighting, and accuracy and stability of the target model in fault information classification and prediction are improved effectively. This is described below by means of several examples.

In order to facilitate understanding of the present embodiment, first, an electronic device executing the power consumption information acquisition device fault classification model training method and the power consumption information acquisition device fault type determination method disclosed in the embodiments of the present application will be described in detail.

As shown in fig. 1, is a block schematic diagram of an electronic device. The electronic device 100 may include a memory 111, a memory controller 112, a processor 113, a peripheral interface 114, an input-output unit 115, and a display unit 116. It will be understood by those of ordinary skill in the art that the structure shown in fig. 1 is merely exemplary and is not intended to limit the structure of the electronic device 100. For example, electronic device 100 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The above-mentioned elements of the memory 111, the memory controller 112, the processor 113, the peripheral interface 114, the input/output unit 115 and the display unit 116 are electrically connected to each other directly or indirectly, so as to implement data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The processor 113 is used to execute the executable modules stored in the memory.

The Memory 111 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 111 is configured to store a program, and the processor 113 executes the program after receiving an execution instruction, and the method executed by the electronic device 100 defined by the process disclosed in any embodiment of the present application may be applied to the processor 113, or implemented by the processor 113.

The processor 113 may be an integrated circuit chip having signal processing capability. The Processor 113 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The peripheral interface 114 couples various input/output devices to the processor 113 and memory 111. In some embodiments, the peripheral interface 114, the processor 113, and the memory controller 112 may be implemented in a single chip. In other examples, they may be implemented separately from the individual chips.

The input/output unit 115 is used for providing data input to the user. The input/output unit 115 may be, but is not limited to, a mouse, a keyboard, and the like.

The display unit 116 provides an interactive interface (e.g., a user operation interface) between the electronic device 100 and the user for reference. In this embodiment, the display unit 116 may be a liquid crystal display or a touch display. The liquid crystal display or the touch display can display the process of the program executed by the processor.

The electronic device 100 in this embodiment may be configured to perform each step in each method provided in this embodiment. The implementation process of the fault classification model training method for the power consumption information acquisition equipment is described in detail through several embodiments.

Please refer to fig. 2, which is a flowchart of a method for training a fault classification model of power consumption information collection equipment according to an embodiment of the present application. The specific process shown in fig. 2 will be described in detail below. The execution main body of the power utilization information acquisition equipment fault classification model training method can be a computer, a cloud server, an intelligent terminal or other electronic equipment capable of performing operation processing.

And step 210, performing data processing on the fault information sample set to obtain a training set.

For example, the fault information sample set may be derived from abnormal information data of a power consumption information collecting device fault meter or a collecting work order.

The fault types of the samples in the fault information sample set can include but are not limited to nine types of abnormal types, such as reverse electric quantity abnormality of the electric energy meter, operation error of the electric energy meter, voltage phase failure, backward running of the electric energy meter, stop running of the electric energy meter, clock abnormality of the electric energy meter, uneven electric energy representation value, flying of the electric energy meter, abnormality of a long-term metering device and the like. The nine exception types may serve as labels for individual samples in the failure information sample dataset. Alternatively, each exception type may be represented by a class label value. For example, the category label values corresponding to the nine exception types may be represented as category 0, category 1, category 2, category 3, category 4, category 5, category 6, category 7, and category 8, respectively.

The faults of the electric energy meter are related to the equipment operation environment, the operation duration and the equipment information. The samples in the fault information sample set may include: the system comprises the property characteristic data of district numbers, terminal asset numbers, electric energy meter bureau numbers, operation time, inventory time, city companies, terminal manufacturers, terminal models, electric energy meter manufacturers, electric energy meter types, electric energy meter hardware versions, communication protocols, electric energy meter wiring modes and the like. The attribute feature data may be converted into attribute feature values that the electronic device is capable of processing.

The sample set of fault information may be data collected by a human. Due to errors and uncertainty in manual work order abnormal information data acquisition, inherent defects of data loss, data repetition, data abnormality and the like exist in data in a fault information sample set.

Further, the fault information sample set is subjected to data processing to eliminate errors and other adverse effects caused by inherent defects of the fault information sample set, a normal data set including fault information such as class label values and attribute characteristic values before training is obtained, the precision of a subsequent classification learning and training weighted fusion model is improved, and therefore the accuracy and stability of subsequent fault classification prediction according to the training model are improved.

And step 220, inputting the training set into a plurality of classifiers for training to obtain the recall rate of different fault types of each classifier.

For example, each training sample in the input training set may contain information having a plurality of attribute feature values, which may be expressed as:

，

is the k-th attribute feature value, wherein,

the number of attribute eigenvalues.

Further, each training sample in the input training set may include a class label value corresponding to the attribute feature value, which may be represented as:

，Y _i a class label value representing the ith class, wherein,ithe number of category labels. Alternatively,ithe maximum value of the values is 9, which indicates that the input training set data has fault information of at most 9 category labels. Then, the input training set data may be specifically expressed as: (X) ₁ ，X ₂ ，...，X _k ；Y _i ）。

Specifically, the training samples in the training sets are input into each initial classifier for training and learning, and the mapping relationship between the attribute features and the label classes is found, so that a target classifier is learned for each initial classifier.

The target classifier can be simply expressed as:

,

is the number of classifiers. When new unknown fault information data with attribute characteristics but without label categories are input into a target classifier, the input unknown data can be predicted according to the target classifier so as to obtain the fault type of the unknown data.

Recall, also known as recall, is defined as the ratio of the true samples to all the positive samples. The recall rate can indicate how many correct examples in the fault information sample set are predicted correctly, namely the accuracy of predicting the correct examples for each fault category can be measured, and the recognition capability of the classifier on the correct examples can be measured.

Specifically, through training and learning of each initial classifier, the recall rate of each target classifier for each label category can be finally obtained.

Please refer to fig. 3, which is a flowchart of solving the recall ratio according to an embodiment of the present application. The specific flow shown in fig. 3 will be described in detail below.

Optionally, the recall rate is solved by a K-fold cross-validation method, the whole process is to averagely divide the initially input training data into K equal parts, one part is taken for testing each time, the rest is used for training, and the experiment is repeated for five times to obtain the average value.

Specifically, step 220 may include steps 221 through 223.

Step 221, determining a training subset and a testing subset according to a current training set;

when the training is executed for the first time, the current training set is a training set; in the nth cycle, the current training set is the training subset used in the (n-1) th training. The value of n is a positive integer less than or equal to K.

Optionally, the ratio of the training subset to the testing subset is: (K-n): 1.

in one example, K can be 4, 5, 6, 7, 8, 9, etc.

Step 222, inputting the training subset into a plurality of classifiers for training to obtain a training intermediate classifier;

step 223, inputting the test subset into the intermediate classifier for testing to obtain a target classifier;

the steps 221 to 223 are repeatedly performed until n is equal to K.

And step 224, calculating the average value of the classification accuracy to obtain the recall ratio.

Optionally, the value of K is 5, different complementary subsets are obtained by circularly dividing the processed fault information sample set into different training sets and test sets for 5 times, 5 times of cross validation is performed, and the average value of the 5 times of validation classification accuracy results is taken as the recall rate. The method realizes learning of the fault information sample set from multiple angles, avoids trapping in local extreme values, and accordingly improves stability of the cross verification method for solving recall rate results.

And 230, performing weighted fusion on the plurality of classifiers according to the recall rate to obtain a target fault classification model.

Exemplarily, in order to find the mapping relation between the attribute characteristic value and the class label, a fault information sample set subjected to a series of data processing is input into a plurality of initial classifiers for training, and a variable input from the attribute characteristic value is obtainedxTarget classifier mapping function f (x) to discrete class label output variable y. The recall rate can represent the classification accuracy of the classifier for each fault category, the recall rate obtained by training each initial classifier is used as the weight of a mapping function of each target classifier, a target fault classification model established based on a plurality of target classifiers is obtained through weighted fusion, and subsequent prediction classification of unknown category label fault data is carried out according to the target fault classification model.

Please refer to fig. 4, which is a flowchart illustrating a method for training a target fault classification model according to an embodiment of the present disclosure. The specific flow shown in fig. 4 will be described in detail below.

Optionally, step 230 may include: steps 231 through 233.

231, dividing a fault information sample set into a training set and a test set according to a proportion;

specifically, the lumped sample number of the fault information samples is set as T, the class number of the sample labels is set as m, and the number of the classifiers is set as n. And dividing the fault information sample set into a training set and a testing set according to the ratio of 7: 3. The classification of the test set is too little, so that the types of the faults in the test set are not complete, and the performance of a verification model of a subsequent test set is reduced; too much test set division can result in too few training set samples, thereby reducing the accuracy of training out classification models, and therefore, the test set is divided into three equal parts. And then carrying out a series of data processing on the divided training sets to eliminate the influence of the data to obtain a new training set, wherein the sample of the new training set is used as an input training set sample.

Step 232, inputting the training set after data processing into a plurality of classifiers for training to obtain a recall rate;

specifically, five-fold cross validation is performed on a plurality of initial classifiers based on an input training set sample to obtain recall rates R of different categories of each target classifier, which can be expressed as:

，

(ii) a recall on the jth classifier for samples of the ith class; meanwhile, through multiple training, parameters in the initial classifier are continuously adjusted, and finally a mapping function of each target classifier is obtained, which can be simply expressed as:

,

as the number of classifiers。

And 233, inputting the test set into a plurality of classifiers for prediction classification, and performing weighted fusion based on the recall rate to obtain a target fault classification model.

Specifically, classification prediction is performed on a plurality of trained target classifier mapping functions by using a trisection test set divided by a fault information sample set, so that a probability value of each target classifier divided into different classes of labels is obtained.

Using the class label recall of the target classifier as a weight, the probability value predicted by the target classifier classification as the class label can be expressed as:p _ij i.e. the probability value that the sample of the ith class is predicted on the jth target classifier.

Taking the product of the recall rate of the category label and the probability value of the target classifier classified into the category label as the score of the category label output by the target classifier, summing and superposing the scores of all the target classifiers to obtain a weighted and fused target fault classification model, which is expressed as the following formula:

wherein, the first and the second end of the pipe are connected with each other,

for the recall on the jth target classifier of samples labeled in the ith category,p _ij a predicted probability value for classifying the class into i class for the jth object classifier, n represents the number of object classifiers that classify the class into the ith class label,L _i and a final score representing the final classification of the target fault classification model into the class label i.

Referring to fig. 4, step 210 may include: step 211 and step 212.

For example, the fault information sample set is derived from abnormal information data of the power consumption information collecting equipment fault metering or the collected work order, and the number of samples of fault information of each category of the collected abnormal information data can be represented by the following table 1.

TABLE 1

And step 211, performing data preprocessing on the fault information sample set to obtain an initial training set.

Illustratively, the data in table 1 is raw data of the acquired fault information sample set, and no association and no irregularity between the data can be seen. Based on the complexity of the original data, the original data can be subjected to data preprocessing operations such as screening, filling and classification processing before the classifier starts to train, the situation that the classification algorithm cannot be directly calculated due to the defects of the data is prevented, the training speed of the classifier can be increased after the data is processed, and the precision of a classification model is improved.

Optionally, step 211 may include: step 211a, step 211b and step 211 c.

And step 211a, identifying the fault information sample set to determine a missing sample, and processing the missing sample to obtain an initial training set.

Illustratively, if the original fault information data sample set has missing data, the missing data can weaken the effectiveness of the attribute features in the original data sample set, and great errors are caused to the final prediction result of the trained weighted fusion classification model. Therefore, the acquired fault information sample set is identified, the missing values are processed, and the influence of the missing data is eliminated.

Specifically, step 211a may include: step 211a1 and step 211a 2.

In step 211a1, if the data type of the missing sample is a discrete type, the missing sample is deleted from the fault information sample set.

Because the power consumption information acquisition system has large coverage scale, the detailed information of a plurality of electric energy meters is difficult to search. Therefore, if the attribute characteristics of the missing data are discrete variables, the discrete data can be a missing data sample containing attribute characteristics of the electric energy meter manufacturer, the terminal model and the like.

Optionally, pychar is a Python IDE (Integrated Development Environment) with a complete set of tools that can help users improve their efficiency when developing using Python language, such as debugging, syntax highlighting, project management, code hopping, intelligent hints, autocompletion, unit testing, versioning.

Further, these discrete data samples are not suitable for use in training models, and are not easy to fill up, and the data samples are directly deleted in the PyCharm database.

In step 211a2, if the data type of the missing sample is continuous, the missing sample is padded.

Further, if the attribute characteristics of the missing data are continuous variables, the continuous data may be missing data samples including attribute characteristics such as the running time of the electric energy meter. Samples of the missing data neighbor batch may be selected for padding, following the run-time property features of the neighbor data.

Optionally, the filling of the continuous missing data is realized by calling a fillna () function in a computer programming language database in PyCharm.

Interpolating nearest neighbor data for continuous missing data to supplement missing values; for discrete missing data, the method is not suitable for being used in a training model, and the influence of the missing data is eliminated by direct deletion processing, so that the training effect can be improved. And step 211b, or, normalizing the fault information sample set to obtain an initial training set.

Illustratively, the raw data in the acquired sample set of fault information is normalized according to the following formula:

wherein the content of the first and second substances,

normalized data for the fault information sample of the class i category label,

for the raw data in the sample set of fault information,

is the maximum value of the fault information sample of the class i class label,

and the minimum value of the fault information sample of the ith class label.

All data are normalized to be in a range with the mean value of 0 and the standard deviation of 1 by uniformly transforming original data samples, and if abnormal points occur, a small number of abnormal points have little influence on the mean value due to certain data volume, so that the accuracy of subsequent classification training is improved. And the different attribute characteristic values are scaled to the same numerical value interval, so that different indexes can be compared with each other, and the weight influence of data with different unit sizes on the final classification model is effectively eliminated.

Step 211 c: or, performing correlation analysis on the fault information sample set to determine the correlation of each sample in the fault information sample set, and screening the samples according to the correlation of each sample in the fault information sample set to obtain an initial training set.

For example, in the process of data preprocessing of the original fault information data sample, data redundancy and repetition also reduce the efficiency of subsequent training and learning, and reduce the accuracy of the training model. Therefore, correlation analysis can be performed on the original fault information samples before training, and the correlation of each sample in the sample set is determined.

Optionally, the correlation analysis is performed according to the pearson correlation coefficient, and the calculation process is as follows:

wherein the content of the first and second substances,

for the correlation coefficient between the ith class label and the kth attribute feature,

for the characteristic value of the k-th attribute,Y _i representing the ith category label value.

Optionally, an attribute with a correlation coefficient between features larger than 0.7 is determined as a redundant attribute feature, and one of the redundant attribute features is retained. And then sorting the correlation coefficients of the features and the fault information types from large to small according to absolute values, reserving the attribute feature values with strong first two thirds of correlations, and removing the attribute feature values with weak second two thirds of correlations.

And eliminating the attribute characteristic value and the redundant characteristic value which are weakly related to the fault category label, so as to screen out the data attribute characteristic which is most beneficial to constructing a classification model and obtain the unbalanced data set to be processed.

And 212, adjusting the unbalance proportion of the initial training set to obtain the training set.

Illustratively, table 2 is the percentage of each category of the collected sample set of fault information. It can be seen that the fault information sample species distribution is quite unbalanced. The faults of most types of samples such as the abnormal category of the electric energy meter clock account for nearly half of the whole fault information sample set, and the faults of few types of samples such as the fault category of the long-term metering device account for only 0.36%.

TABLE 2

Step 212 may include step 212a and step 212 b.

In step 212a, by performing SMOTE oversampling on the minority class samples, the minority class samples can be promoted to the same number as the majority class. The oversampling is calculated as follows:

wherein the content of the first and second substances,xrandomly selecting a nearest sample for data of a few classes of samples

Then fromxTo

Randomly selecting a sample point on the connecting line

As new few types of sample data; and respectively applying the SMOTE oversampling algorithm to increase the number of the minority class samples to the number which is the same as the number of the majority class samples with the largest class number to the fault type of each class label.

And step 212b, performing undersampling treatment on the data set subjected to SMOTE oversampling treatment globally, and removing noise of each category label fault information data.

Alternatively, ENN undersampling is employed. ENN undersampling is to eliminate a sample by obtaining K neighbor samples of a majority sample, and if all or most of the neighbor samples are inconsistent with the majority sample type, considering the sample as a noise value. The number of most samples can be reduced to a certain extent by using the ENN algorithm, but the number of culling in the method is limited.

Specifically, a combined sampling processing mode of oversampling and undersampling is adopted to eliminate imbalance, a data set with balanced fault types is formed, then a classifier is constructed based on the data set subjected to balancing processing to train, and the detection rate of few types of fault samples is improved.

Furthermore, the acquired original data is processed twice, and data preprocessing is firstly carried out, so that the influence of the defects of the data on subsequent classification training is reduced as much as possible. And then, the preprocessed data is processed to eliminate the unbalanced characteristics, so that more stable and reliable training data can be obtained.

Referring to fig. 2, the plurality of classifiers are one or more of K-nearest neighbor classifiers, decision tree classifiers, support vector machine classifiers, bayesian classifiers, or random forest classifiers.

Illustratively, the trained classifier is a combination of several classifiers such as a commonly used K-nearest neighbor classifier, a decision tree classifier, a support vector machine classifier, a bayesian classifier, or a random forest classifier. Training is carried out based on several common classifiers, and a final target fault classification model is obtained through training after weighted fusion is carried out based on the recall rate.

Optionally, the fault information sample set is also divided into a training set and a test set according to 7: 3. And carrying out a series of data processing on the training set, including SMOTE oversampling and ENN undersampling data processing on the training set, and eliminating unbalanced data. And when the fusion model is trained, selecting a K nearest neighbor classifier, a decision tree and a random forest as three base classifiers, and weighting by recall ratio.

And training on the three base classifiers by using the divided training sets through the power consumption information acquisition equipment fault classification model training method, wherein the table 3 shows the recall rate of different classes of label faults obtained by training the three base classifiers and the target fault classification model, and the recall rate can show the classification accuracy of each class.

As can be seen from Table 3, the classification accuracy of different base classifiers is respectively superior in each class. The K nearest neighbor classifier has better effect on the categories 7 and 8, the decision tree is better than the K nearest neighbor classifier and the random forest classifier in the categories 2 and 5, and the random forest classifier has higher classification accuracy on the categories 0, 1 and 3.

TABLE 3

Optionally, a K nearest neighbor classifier, a decision tree and a random forest are used as three base classifiers to train a fusion model, and a target fault classification model is obtained. Comparing the classification accuracy of the target fault classification model, it can be seen that the classification accuracy is lower than that of the individual base classification model except for the category 0, the category 1 and the category 7, and other categories all obtain the highest classification accuracy, so that the performances of three different base classifiers are reflected, and the effectiveness of the method for establishing the classification model based on recall rate weighted fusion on the fault classification of the intelligent electric energy meter of the power utilization acquisition system is proved.

Further, as shown in table 4, the three base classifiers and the target fault classification model are compared with the test set to predict the classification accuracy, the recall rate, the F-score, the geometric mean and other performance indexes.

It can be seen that the target fault classification model improves the accuracy, recall, F-score and geometric mean by 1.68%, 3.64%, 0.85% and 2.54%, respectively. The target fault classification model has good performance in the fault classification and prediction of the intelligent electric energy meter of the electricity utilization acquisition system, and can improve the classification accuracy.

TABLE 4

In order to cooperate with the power consumption information acquisition equipment fault classification model training method, the embodiment of the application also provides a power consumption information acquisition equipment fault classification model training device.

Based on the same application concept, a fault classification model training device corresponding to the power consumption information acquisition equipment fault classification model training method is further provided in the embodiment of the present application, and as the principle of solving the problem of the device in the embodiment of the present application is similar to that of the aforementioned power consumption information acquisition equipment fault classification model training method embodiment, the implementation of the device in the embodiment of the present application can refer to the description in the embodiment of the above method, and repeated parts are not repeated.

Please refer to fig. 5, which is a schematic diagram of a functional module of a power consumption information collecting device fault classification model training apparatus according to an embodiment of the present application. Each module in the power consumption information acquisition equipment fault classification model training device in this embodiment is used for executing each step in the above method embodiments. The power utilization information acquisition equipment fault classification model training device comprises a processing module 310, a training module 320 and a fusion module 330.

The processing module 310 is configured to process the fault information samples to obtain a training set;

the training module 320 is configured to input the training set to multiple classifiers for training, so as to obtain recall rates of different fault types of the multiple classifiers;

and the fusion module 330 is configured to perform weighted fusion on the multiple classifiers according to the recall rate to obtain a target fault classification model.

In a possible implementation manner, the processing module 310 includes a first processing unit and an adjusting unit:

the first processing unit is used for carrying out data preprocessing on the fault information sample set to obtain an initial training set;

and the adjusting unit is used for adjusting the unbalance proportion of the initial training set to obtain the training set.

In a possible implementation manner, the first processing unit is configured to:

identifying the fault information sample set to determine a missing sample; processing the missing samples to obtain an initial training set;

or, carrying out standardization processing on the fault information sample set to obtain an initial training set;

or, performing correlation analysis on the fault information sample set to determine the correlation in the fault information sample set, and screening samples according to the correlation of each sample in the fault information sample set to obtain an initial training set.

In a possible implementation manner, the first processing unit may be configured to:

identifying the fault information sample set to determine a missing sample; processing the missing samples to obtain an initial training set; or, carrying out standardization processing on the fault information sample set to obtain an initial training set; or, performing correlation analysis on the fault information sample set to determine the correlation of each sample in the fault information sample set, and screening the samples according to the correlation of each sample in the fault information sample set to obtain an initial training set.

If the data type of the missing sample is discrete, deleting the missing sample from the fault information sample set; and if the data type of the missing sample is continuous, filling the missing sample.

In a possible embodiment, the adjusting unit may be configured to:

performing oversampling processing on a few types of samples of the initial training set to obtain an oversampling initial training set; and carrying out undersampling treatment on the oversampled initial training set, and eliminating the noise value of the oversampled initial training set to obtain the training set for eliminating unbalanced data.

In a possible implementation, the training module 320 may be configured to:

the classifiers are one or more of K nearest neighbor classifiers, decision tree classifiers, support vector machine classifiers, Bayesian classifiers or random forest classifiers in any combination.

In a possible implementation manner, the fusion module 330 may be configured to:

Please refer to fig. 6, which is a method for determining a fault type of an electrical information collection device according to an embodiment of the present application. The specific flow shown in fig. 6 will be described in detail below.

Step 410, acquiring to-be-identified power utilization fault information data;

illustratively, the electricity failure information data to be identified is derived from electricity failure information acquisition equipment failure metering or acquisition work order abnormal information data.

And step 420, inputting the power consumption fault information data to be identified into a target fault classification model determined by the power consumption information acquisition equipment fault classification model training method for identification so as to determine the fault type of the power consumption fault information data to be identified.

Illustratively, a weighted fusion classification function or classification model is learned through the power consumption information collection equipment fault classification model training method, and is used as a target fault classification model, which can be expressed as:

optionally, the scores of each class label based on the target fault classification model are compared by a maximum calculation

The category label with the highest score is the final classification result of the to-be-identified power consumption fault information data, and can be represented by the following expression:

wherein the content of the first and second substances,

and the classification result is one of the 9 fault types, namely the predicted class label of the to-be-identified power utilization fault information data, which is the maximum value of the ith class label score of the target fault classification model.

Based on the same application concept, the embodiment of the present application further provides a power consumption information acquisition device fault classification device corresponding to the power consumption information acquisition device fault type determination method, and as the principle of solving the problem of the device in the embodiment of the present application is similar to that of the aforementioned power consumption information acquisition device fault type determination method, the implementation of the device in the embodiment of the present application may refer to the description in the embodiment of the above method, and repeated parts are not described again.

Please refer to fig. 7, which is a schematic diagram of a functional module of a fault classification apparatus for power consumption information collecting devices according to an embodiment of the present application. Each module in the fault classification device for the electricity information acquisition equipment in the embodiment is used for executing each step in the above method embodiment. The power utilization information acquisition equipment fault classification device comprises an acquisition module 510 and a classification module 520; wherein the content of the first and second substances,

an obtaining module 510, configured to obtain the to-be-identified power consumption fault information data;

the classification module 520 may be configured to input the power consumption fault information data to be identified into the target fault classification model determined by the power consumption information collection device fault classification model training method for identification, so as to determine the fault type of the power consumption fault information data to be identified.

In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the power consumption information acquisition device fault classification model training method and the power consumption information acquisition device fault type determination method described in the above method embodiments are executed.

The computer program product of the power consumption information acquisition device fault classification model training method and the power consumption information acquisition device fault type determination method provided in the embodiments of the present application includes a computer readable storage medium storing a program code, where instructions included in the program code may be used to execute the steps of the power consumption information acquisition device fault classification model training method and the power consumption information acquisition device fault type determination method described in the embodiments of the above methods, which may be specifically referred to the embodiments of the above methods, and are not described herein again.

To sum up, the embodiment of the present application provides a method and an apparatus for training a fault classification model of power consumption information acquisition equipment, where the method for training the fault classification model of the power consumption information acquisition equipment includes: performing data processing on the fault information sample set to obtain a training set; inputting the training set into a plurality of classifiers for training to obtain recall rates of different fault types of the plurality of classifiers; and performing weighted fusion on the plurality of classifiers according to the recall rate to obtain a target fault classification model. The method for determining the fault type of the electricity information acquisition equipment comprises the following steps: acquiring to-be-identified power utilization fault information data; and inputting the power utilization fault information data to be identified into a target fault classification model determined by the power utilization information acquisition equipment fault classification model training method for identification so as to determine the fault type of the power utilization fault information data to be identified.

In the implementation process, data processing is carried out through acquired fault information sample data, the processed data are input into a plurality of classifiers for training, the recall rate obtained by training of each classifier is used as the weight of the same class of each classifier, a target fault classification model established based on the plurality of classifiers is obtained through weighted fusion, and subsequent prediction classification of unknown class label fault data is carried out according to the target fault classification model. Therefore, the advantages of different single classifiers are embodied, and the accuracy and the stability of effective classification prediction of the training model are improved.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A power utilization information acquisition equipment fault classification model training method is characterized by comprising the following steps:

performing data processing on the fault information sample set to obtain a training set; inputting the training set into a plurality of classifiers for training based on a K-fold cross validation method to obtain different complementary subsets, and carrying out K times of cross validation on the complementary subsets to obtain K classification accuracy rates to obtain an average value so as to obtain recall rates of the plurality of classifiers in different fault types;

according to the recall rate, performing weighted fusion on the classifiers to obtain a target fault classification model; the recall rate is used for predicting the correct accuracy rate of each fault type and is used as the weight of the plurality of classifiers; if the accuracy rate of the multiple classifiers for predicting the correctness of a certain fault type is high, the weighting weight of the certain fault type is large;

wherein, the data processing of the fault information sample set to obtain a training set includes:

deleting discrete missing samples in the fault information samples, and/or filling continuous missing samples in the fault information samples with adjacent data of the missing samples.

2. The method of claim 1, wherein the processing data in the fault information sample set to obtain a training set comprises:

performing data preprocessing on the fault information sample set to obtain an initial training set;

and adjusting the unbalance proportion of the initial training set to obtain the training set.

3. The method of claim 2, wherein the pre-processing the data of the fault information sample set to obtain an initial training set comprises:

or, performing correlation analysis on the fault information sample set to determine the correlation of each sample in the fault information sample set, and screening the samples according to the correlation of each sample in the fault information sample set to obtain an initial training set.

4. The method of claim 2, wherein adjusting the imbalance ratio of the initial training set to obtain the training set comprises:

performing oversampling processing on a few types of samples of the initial training set to obtain an oversampling initial training set;

and carrying out undersampling treatment on the oversampled initial training set, and eliminating the noise value of the oversampled initial training set to obtain the training set for eliminating unbalanced data.

5. The method of claim 1, wherein the plurality of classifiers are any combination of one or more of a K-nearest neighbor classifier, a decision tree classifier, a support vector machine classifier, a bayesian classifier, or a random forest classifier.

6. A method for determining the fault type of power utilization information acquisition equipment is characterized by comprising the following steps:

acquiring to-be-identified power utilization fault information data;

inputting the power consumption fault information data to be identified into a target fault classification model determined by the power consumption information acquisition equipment fault classification model training method according to any one of claims 1 to 5 for identification so as to determine the fault type of the power consumption fault information data to be identified.

7. The utility model provides a power consumption information acquisition equipment fault classification model trainer which characterized in that includes:

the processing module is used for carrying out data processing on the fault information sample set to obtain a training set; the processing module is specifically configured to: deleting discrete missing samples in the fault information samples, and/or filling continuous missing samples in the fault information samples with adjacent data of the missing samples;

the training module is used for inputting the training set into a plurality of classifiers for training based on a K-fold cross validation method to obtain different complementary subsets, and averaging K classification accuracy rates obtained by performing K times of cross validation on the complementary subsets to obtain recall rates of the plurality of classifiers in different fault types;

the fusion module is used for performing weighted fusion on the classifiers according to the recall rate to obtain a target fault classification model; the recall rate is used for predicting the correct accuracy rate of each fault type and is used as the weight of the plurality of classifiers; and if the accuracy of the multiple classifiers for predicting the correctness of a certain fault type is higher, the weighting weight of the certain fault type is larger.

8. The utility model provides a power consumption information acquisition equipment fault classification device which characterized in that includes:

the acquisition module is used for acquiring the power utilization fault information data to be identified;

the classification module is used for inputting the power consumption fault information data to be identified into the target fault classification model determined by the power consumption information acquisition equipment fault classification model training method according to any one of claims 1 to 5 for identification so as to determine the fault type of the power consumption fault information data to be identified.

9. An electronic device, comprising: a processor, a memory storing machine-readable instructions executable by the processor, the machine-readable instructions when executed by the processor performing the steps of the method of any of claims 1 to 6 when the electronic device is run.

10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 6.