Disclosure of Invention
The invention aims to provide a method for predicting whether an in-service electric energy meter fails or not by learning a meter-detaching fault characteristic rule, which can reduce the risk of concentrated explosion of batch faults of the in-service electric energy meter and has high working efficiency.
The invention adopts the following technical scheme:
a method for predicting whether an electric energy meter in operation fails or not by learning a rule of disassembling fault characteristics of the meter uses an XGboost model to learn a rule of whether the electric energy meter fails or not in data of a sorting result and historical operation events of the electric energy meter, and predicts whether the electric energy meter in operation fails or not by using the learned rule.
The method specifically comprises the following steps:
step one, data acquisition: collecting data of the disassembled sorting table;
step two, data preprocessing: processing the acquired data into data which can be used by a model;
step three, modeling analysis: analyzing the preprocessed data in the second step by using the XGboost model, and learning a rule for judging whether the electric energy meter is in fault;
step four, falling the data result to the ground: and D, judging whether the in-service electric energy meter fails or not by using the data of the in-service electric energy meter processed in the step two according to the learned rule.
In the method, the data source of the step one is an ORACLE database system.
In the method, data are collected in an MDS system contained in an ORACLE database system.
In the method, the data of the sorting table is disassembled in the step one, and the data comprises collected abnormal data, metering abnormal data, disassembled sorting data and archive data.
In the method, the data which can be used by the model in the step two comprises data used for learning and data used for predicting.
In the method, the processing of the data in the step two is completed by the database storage process.
In the method, the XGboost model in the third step is trained and predicted by Python.
In the fourth step, the meter-detaching fault rule obtained by the XGboost model is compared with the data of the electric energy meter in operation, and whether the electric energy meter in operation has faults or not is judged.
In the method, it further comprises: step five, visualization operation: and visually displaying the result.
The invention has the beneficial effects that:
1. the invention finds the rule for judging whether the electric energy meter has faults or not by learning the historical data, and can automatically learn the latest judgment rule according to the latest data.
2. The invention realizes the forecasting of the faults of the electric energy meter, leads the working personnel to predict the future occurrence condition of the faults in advance, strengthens the risk precautionary measures, and reduces the risk of concentrated outbreak of the batch faults of the electric energy meter
3. The whole analysis process of the invention does not need manual intervention, and the working efficiency is improved.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
A method for predicting whether an electric energy meter in operation fails or not by learning a rule of disassembling fault characteristics of the meter uses an XGboost model to learn a rule of whether the electric energy meter fails or not in data of a sorting result and historical operation events of the electric energy meter, and predicts whether the electric energy meter in operation fails or not by using the learned rule.
The method specifically comprises the following steps:
step one, data acquisition.
Collecting data of the disassembled sorting table; the data source is an ORACLE database system. Data is collected using an MDS system contained within an ORACLE database system. The data of the disassembled sorting table comprises collected abnormal data, metering abnormal data, disassembled sorting data and archive data.
And step two, preprocessing data.
Processing the acquired data into data which can be used by a model; the processing of the data is completed by a database storage process. The data that the model can use includes data used for learning and data used for predicting needs.
And step three, modeling analysis.
Analyzing the preprocessed data in the second step by using the XGboost model, and learning a rule for judging whether the electric energy meter is in fault; the XGboost model is trained and predicted by Python.
The XGboost model used by the invention has the following principle:
XGboost is one of boosting algorithms. The idea of Boosting is to integrate many weak classifiers together to form one strong classifier. Because the XGboost is a lifting tree model, a plurality of tree models are integrated together to form a strong classifier. The tree model used is the CART regression tree model. The idea of the XGboost algorithm is to continuously add trees, continuously perform feature splitting to grow a tree, and each time a tree is added, actually learn a new function to fit the residual error predicted last time. When training is completed to obtain k trees, a score of a sample is predicted, namely, according to the characteristics of the sample, a corresponding leaf node is fallen in each tree, each leaf node corresponds to a score, and finally, the predicted value of the sample is obtained by only adding the scores corresponding to each tree.
The CART regression tree is a binary tree assumed by continuously splitting features. For example, the current tree node is split based on the jth eigenvalue, and the samples with eigenvalues smaller than s are divided into left subtrees, and the samples with eigenvalues larger than s are divided into right subtrees.
R
1(j,s)={x|x
(j)≤s}and R
2(j,s)={x|x
(j)>s}。
The CART regression tree essentially divides the sample space in the feature dimension, and the optimization of the space division is an NP-hard problem, so that a heuristic method is used for solving the problem in the decision tree model. The objective function generated by a typical CART regression tree is:
therefore, when we solve the optimal segmentation feature j and the optimal segmentation point s, we turn to solve such an objective function:
therefore, the optimal segmentation feature and segmentation point can be found by only traversing all segmentation points of all features. Finally, a regression tree is obtained.
The XGboost objective function is defined as:
the objective function is composed of two parts, the first part is used for measuring the difference between the prediction score and the real score, and the other part is a regularization term. The regularization term also comprises two parts, wherein T represents the number of leaf nodes, and w represents the scores of the leaf nodes. Gamma can control the number of leaf nodes, and lambda can control the fraction of the leaf nodes not to be too large, thereby preventing overfitting.
The newly generated tree is to be fitted to the residual of the last prediction, i.e. after t trees are generated, the prediction score can be written as:
meanwhile, the objective function may be rewritten as:
it is obvious that we want to find f next
tThe objective function can be minimized. The idea behind XGboost is to use it at f
tThe taylor second order expansion at 0 approximates it. Therefore, the objective function is approximated as:
wherein g is
iIs the first derivative, h
iAs second derivative:
the residual error between the prediction score of the first t-1 tree and y does not influence the optimization of the objective function, and can be directly removed. The simplified objective function is:
the above equation is the sum of the loss function values for each sample, and we know that each sample will eventually fall into one leaf node, so we can recombine all samples of the same leaf node, as follows:
therefore, through the rewriting of the above formula, the objective function can be rewritten into a unitary quadratic function about the leaf node fraction w, and the solution of the optimal w and the objective function value becomes simple, and the vertex formula is directly used. Thus, the optimal w and objective function are formulated as
XGboost modeling flow:
(1) this portion of the data is first trained. After training a result, a less accurate tree is obtained. After the evaluation function evaluates the error, the process of generating the second tree is entered.
(2) The target of the second tree is not the original target but the residual of the target and the predicted value of the last tree.
(3) The learning goal of the third tree is the total residual learned by the first two trees, so that the learning, in theory, reduces the total error step by step.
(4) After repeating the iteration n times, stopping the iteration after reaching the maximum iteration number set by us.
(5) And finally, adding the values of the trees trained in each time before as the final training result.
The invention uses a historical returned sorting data training model to obtain the optimal parameters of the XGboost model as follows:
learning rate-learning _ rate
|
0.247061
|
Minimum data quantity-min _ data in one leaf
|
50
|
Minimum of leaf node sample weight sum-min-hessian
|
0.409273
|
Maximum number of leaves-num _ leaves of a tree
|
92
|
Using the scaled sub _ feature of a feature in each iteration
|
0.814067 |
And step four, falling the data result to the ground.
And D, judging whether the in-service electric energy meter fails or not by using the data of the in-service electric energy meter processed in the step two according to the learned rule. Namely, comparing the meter-dismantling fault rule obtained by the XGboost model with the data of the running electric energy meter, and judging whether the running electric energy meter is in fault.
And fifthly, performing visualization operation.
And visually displaying the result.
The technical solution of the present invention will be further described below by using examples. The calculation process is as shown in fig. 1, firstly, the characteristics of each electric energy meter are obtained through data preprocessing, the characteristics of the disassembled sorting meter and the disassembled sorting result are input into the model, the model learns the rule of whether the electric energy meter has faults in the data, and the model parameters are adjusted to obtain the optimal parameters.
Table 1 model input data example
And selecting some electric energy meters from the production scheduling platform system, inputting the characteristics of the electric energy meters into the XGboost model, and predicting whether the electric energy meters are in fault. The model is trained by using the returned sorting data before 2018 in a province, and the accuracy of model prediction is checked by using the returned sorting data 03 months in 2019, wherein the accuracy is 0.72. Some of the results are shown in table 2 below.
Table 2 partial electric energy meter prediction results
|
BAR_CODE
|
Sorting result
|
Predicted results
|
1
|
4130001000000043518932
|
Fault of
|
Fault of
|
2
|
4130001000000021844800
|
Fault of
|
Fault of
|
3
|
4130001000000021891170
|
Fault of
|
Fault of
|
4
|
4130001000000044700794
|
Fault of
|
Fault of
|
5
|
4130001000000030078616
|
Fault of
|
Fault of
|
6
|
4130001000000044705775
|
Fault of
|
Fault of
|
7
|
4130001000000043607117
|
Fault of
|
Fault of
|
8
|
4130001000000080078642
|
Fault of
|
Fault of
|
9
|
4130001000000044700725
|
Fault of
|
Fault of
|
10
|
4130001000000105076240
|
Without failure
|
Without failure
|
11
|
4130001000000081833844
|
Without failure
|
Without failure
|
12
|
4130001000000064720529
|
Without failure
|
Without failure
|
13
|
4130001000000064770579
|
Without failure
|
Without failure
|
14
|
4130001000000077398777
|
Without failure
|
Without failure
|
15
|
4130001000000178304530
|
Without failure
|
Without failure
|
16
|
4130001000000188326645
|
Without failure
|
Without failure
|
17
|
4130001000000041703781
|
Without failure
|
Without failure
|
18
|
4130001000000098329927
|
Without failure
|
Without failure
|
19
|
4130001000000044769258
|
Without failure
|
Fault of
|
20
|
4130001000000044864540
|
Without failure
|
Fault of |
The invention can be used as a functional module, a computer program is compiled according to the principle and the flow chart of the invention, then the computer program is deployed on the operation server, and the related data is calculated to obtain the result. The specific implementation framework of which is shown in fig. 2.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents or improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.