CN114186844A - Method and device for identifying electricity stealing clients - Google Patents

Method and device for identifying electricity stealing clients Download PDF

Info

Publication number
CN114186844A
CN114186844A CN202111500444.8A CN202111500444A CN114186844A CN 114186844 A CN114186844 A CN 114186844A CN 202111500444 A CN202111500444 A CN 202111500444A CN 114186844 A CN114186844 A CN 114186844A
Authority
CN
China
Prior art keywords
electricity stealing
client
training
tree
decision tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111500444.8A
Other languages
Chinese (zh)
Inventor
宫立华
赵振东
朱克
朱龙珠
朱静
翟雪敏
聂玲
左华林
单金宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Co ltd Customer Service Center
Beijing China Power Information Technology Co Ltd
Original Assignee
State Grid Co ltd Customer Service Center
Beijing China Power Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Co ltd Customer Service Center, Beijing China Power Information Technology Co Ltd filed Critical State Grid Co ltd Customer Service Center
Priority to CN202111500444.8A priority Critical patent/CN114186844A/en
Publication of CN114186844A publication Critical patent/CN114186844A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Marketing (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Computational Linguistics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method and a device for identifying electricity stealing clients, which improve the comprehensiveness and the accuracy of data to be identified by constructing an electricity stealing client index system, and take a model with the best identifying effect on the electricity stealing clients in an identification model based on an LM (Linear Algorithm) neural network and an identification model based on a CART (Carrier-associated-term-score) decision tree as an electricity stealing client identification model.

Description

Method and device for identifying electricity stealing clients
Technical Field
The invention relates to the technical field of data analysis, in particular to a method and a device for identifying a power stealing client.
Background
In recent years, electricity stealing still happens occasionally, and great influence is brought to normal power supply and safe electricity utilization.
At present, electricity stealing identification mainly depends on manual patrol combined with index analysis, wherein the manual patrol needs a large amount of manpower and is low in investigation efficiency, the index analysis needs to analyze mass data, and the electricity stealing identification is high in difficulty and low in accuracy.
Disclosure of Invention
In view of this, the invention provides a method and a device for identifying a power stealing client, which can accurately identify the power stealing client.
In order to achieve the above purpose, the invention provides the following specific technical scheme:
a power stealing client identification method, comprising:
acquiring a multi-dimensional index value of a customer to be identified under a pre-constructed electricity stealing customer index system;
inputting the multidimensional index value into a pre-constructed electricity stealing client identification model to obtain an identification result of the client to be identified, wherein the electricity stealing client identification model is a model with the best identification effect on the electricity stealing client in an identification model based on an LM (Linear programming) neural network and an identification model based on a CART (Carrier induced reactor) decision tree, the identification model based on the LM neural network and the identification model based on the CART decision tree are obtained by respectively training an LM neural network model and a CART decision tree model by utilizing a training sample which is marked as an electricity stealing client or not, and the training sample comprises the multidimensional index value under an electricity stealing client index system.
Optionally, the obtaining a multidimensional index value of the customer to be identified under a pre-constructed electricity stealing customer index system includes:
acquiring power consumption data of the customer to be identified in a preset time period;
extracting data corresponding to the multidimensional indexes under the electricity stealing customer index system from the electricity utilization data;
data corresponding to the multidimensional indexes under the electricity stealing client index system are cleaned, and a Lagrange interpolation method is adopted to perform interpolation processing on the cleaned missing values;
and normalizing the data corresponding to the multidimensional indexes under the electricity stealing client index system after interpolation processing to obtain the multidimensional index values.
Optionally, the method further includes:
determining the number of neurons of an input layer in the LM neural network according to the number of multi-dimensional indexes in the electricity stealing client index system, and determining the number of neurons of an output layer in the LM neural network as 1;
carrying out multiple convergence training on the LM neural network by using the training samples by using different initial training parameters;
acquiring the precision of a training set and the precision of a test set after each convergence training, and respectively carrying out weighted average calculation on the precision of the training set and the precision of the test set after each convergence training according to the preset precision weight of the training set and the preset precision weight of the test set to obtain the precision value of each convergence training;
and determining the network parameter corresponding to the convergence training with the highest precision value as the optimal network parameter to obtain the identification model based on the LM neural network corresponding to the optimal network parameter.
Optionally, the method further includes:
selecting one or more multidimensional indexes from the electricity stealing client index system as the division attributes of tree nodes according to preset rules, taking each multidimensional index in the training sample as each branch of a test variable tree, repeating the process until one of preset conditions is met, stopping building the tree, and generating a CART decision tree;
pruning the generated CART decision tree by using a pruning algorithm to form a sub-tree sequence;
testing the sub-tree sequences on an independent verification data set through a cross verification method, and selecting an optimal sub-tree from the sub-tree sequences as the CART decision tree-based recognition model;
wherein the preset conditions include:
the number of samples in all leaf nodes of the CART decision tree is 1 or the samples belong to the same class;
the CART decision tree height reaches a threshold set by the user.
Optionally, the method further includes:
respectively counting the recognition accuracy of the recognition model based on the LM neural network and the recognition model based on the CART decision tree;
and determining the model with the highest identification accuracy as the electricity stealing client identification model.
Optionally, the method further includes:
respectively drawing ROC curves of the identification model based on the LM neural network and the identification model based on the CART decision tree;
and determining the model with the maximum AUC under the ROC curve as the electricity stealing customer identification model.
A power stealing client identifying device comprising:
the system comprises a to-be-identified client data acquisition unit, a power stealing client index system acquisition unit and a power stealing client index system acquisition unit, wherein the to-be-identified client data acquisition unit is used for acquiring a multi-dimensional index value of a to-be-identified client under a pre-constructed power stealing client index system;
the electricity stealing client identification unit is used for inputting the multi-dimensional index values into a pre-constructed electricity stealing client identification model to obtain the identification result of the client to be identified, wherein the electricity stealing client identification model is the model with the best identification effect on the electricity stealing client in the identification model based on the LM neural network and the identification model based on the CART decision tree, the identification model based on the LM neural network and the identification model based on the CART decision tree are obtained by respectively training the LM neural network model and the CART decision tree model by utilizing training samples which are marked as electricity stealing clients or not, and the training samples comprise the multi-dimensional index values under the electricity stealing client index system.
Optionally, the to-be-identified client data obtaining unit is specifically configured to:
acquiring power consumption data of the customer to be identified in a preset time period;
extracting data corresponding to the multidimensional indexes under the electricity stealing customer index system from the electricity utilization data;
data corresponding to the multidimensional indexes under the electricity stealing client index system are cleaned, and a Lagrange interpolation method is adopted to perform interpolation processing on the cleaned missing values;
and normalizing the data corresponding to the multidimensional indexes under the electricity stealing client index system after interpolation processing to obtain the multidimensional index values.
Optionally, the apparatus further includes a first model building unit, specifically configured to:
determining the number of neurons of an input layer in the LM neural network according to the number of multi-dimensional indexes in the electricity stealing client index system, and determining the number of neurons of an output layer in the LM neural network as 1;
carrying out multiple convergence training on the LM neural network by using the training samples by using different initial training parameters;
acquiring the precision of a training set and the precision of a test set after each convergence training, and respectively carrying out weighted average calculation on the precision of the training set and the precision of the test set after each convergence training according to the preset precision weight of the training set and the preset precision weight of the test set to obtain the precision value of each convergence training;
and determining the network parameter corresponding to the convergence training with the highest precision value as the optimal network parameter to obtain the identification model based on the LM neural network corresponding to the optimal network parameter.
Optionally, the apparatus further includes a second model building unit, specifically configured to:
selecting one or more multidimensional indexes from the electricity stealing client index system as the division attributes of tree nodes according to preset rules, taking each multidimensional index in the training sample as each branch of a test variable tree, repeating the process until one of preset conditions is met, stopping building the tree, and generating a CART decision tree;
pruning the generated CART decision tree by using a pruning algorithm to form a sub-tree sequence;
testing the sub-tree sequences on an independent verification data set through a cross verification method, and selecting an optimal sub-tree from the sub-tree sequences as the CART decision tree-based recognition model;
wherein the preset conditions include:
the number of samples in all leaf nodes of the CART decision tree is 1 or the samples belong to the same class;
the CART decision tree height reaches a threshold set by the user.
Optionally, the apparatus further comprises:
the recognition model evaluation unit is used for respectively counting the recognition accuracy of the recognition model based on the LM neural network and the recognition model based on the CART decision tree; and determining the model with the highest identification accuracy as the electricity stealing client identification model.
Optionally, the apparatus further comprises:
the identification model evaluation unit is used for respectively drawing ROC curves of the identification model based on the LM neural network and the identification model based on the CART decision tree; and determining the model with the maximum AUC under the ROC curve as the electricity stealing customer identification model.
Compared with the prior art, the invention has the following beneficial effects:
the invention discloses an electricity stealing client identification method, which improves the comprehensiveness and the accuracy of data to be identified by constructing an electricity stealing client index system, takes a model with the best identification effect on electricity stealing clients in an identification model based on an LM (Linear matrix) neural network and an identification model based on a CART (Carrier locator) decision tree as an electricity stealing client identification model, and has the advantages that the LM neural network has the local convergence of a Gaussian-Newton method and the global characteristic of a gradient descent method, the CART decision tree is high in efficiency, the maximum calculation frequency of each prediction does not exceed the depth of the decision tree, and the decision tree does not have any hypothesis requirement on statistical distribution on input data, so that the electricity stealing client identification is carried out by utilizing an optimal model selected from the two models, and the overall optimal solution with high convergence speed and high precision can be obtained, thereby improving the electricity stealing client identification efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for identifying a power stealing client according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a subscriber identity module according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The inventor finds out through research that: the traditional BP neural network algorithm is an optimization method of local search, is easy to trap in local extremum, has low convergence speed and is very sensitive to initial network weight. And the accuracy of the logistic regression algorithm is not high, and the real distribution of the fitting data is difficult.
On the basis, the recognition model with the best recognition effect on electricity stealing clients in the recognition model based on the LM neural network and the recognition model based on the CART decision tree is used as the recognition model of the electricity stealing clients, because the LM neural network has the local convergence of the Gaussian-Newton method and the global characteristic of the gradient descent method, the CART decision tree is high in efficiency, the maximum calculation frequency of each prediction does not exceed the depth of the decision tree, and the decision tree does not have any hypothesis requirement of statistical distribution on input data, the electricity stealing clients are recognized by using the optimal model selected from the two models, the global optimal solution with high convergence speed and high precision can be obtained, and the recognition efficiency of the electricity stealing clients is improved.
Specifically, referring to fig. 1, the method for identifying a power stealing client disclosed by the embodiment of the present invention includes the following steps:
s101: acquiring a multi-dimensional index value of a customer to be identified under a pre-constructed electricity stealing customer index system;
the invention takes an electricity acquisition terminal and an electric energy meter as research objects, combines a business scene, determines data requirements according to business contents, starts from an electricity information acquisition system and a marketing business application system, and comprises an acquisition point, a running electric energy meter, a daily measurement point power curve, a daily measurement point voltage curve, a daily measurement point current curve, measurement point daily frozen electric energy information, an electricity consumer, a metering point, a power supply unit, an electricity utilization address, default electricity utilization and stealing information and the like.
And when data is accessed, data quality inspection is carried out on the aspects of data integrity, repeatability, accuracy and the like, and cleaning and missing value interpolation processing are carried out on the data.
And performing data distribution analysis and data periodicity analysis on the processed data. The data distribution analysis is used for carrying out distribution analysis on the time periods where the historical electricity stealing data and the customer electricity utilization data are accessed, and the electricity stealing customer distribution conditions of all electricity utilization categories are counted. The data periodicity analysis-analysis method analyzes and compares the periodicity change of the two types of customers by drawing time series trend graphs of the power consumption of normal power customers and electricity stealing customers, and finds out the rule and the characteristic of the power consumption data distribution of the normal power customers and the electricity stealing customers.
And constructing the following electricity stealing client index system from 7 dimensions of voltage class, current class, electric quantity class, abnormal event class, historical electricity stealing class, phase sequence class and archive class according to the analysis result, wherein the index system represents the law of the electricity stealing client behavior.
Figure BDA0003401379390000061
Figure BDA0003401379390000071
The method comprises the following steps that a magnetic field abnormal event and an electric energy meter uncapping event occur from an electricity utilization information acquisition system, wherein the electricity utilization information acquisition system comprises a high-frequency magnetic field abnormal event, a power frequency magnetic field abnormal event, a strong magnetic field abnormal event, uncapping times and an uncapping duration mean value; the abnormal event of the defect of the electric equipment and the abnormal operation of the power price execution are from the application system of the marketing business, and comprise a self-contained emergency power supply event, a high-voltage cable exposure event, a fixed-ratio quantitative execution error and a high-supply low-count.
After an electricity stealing client index system is built, obtaining a multi-dimensional index value of a client to be identified under the pre-built electricity stealing client index system, which specifically comprises the following steps:
acquiring power consumption data of a customer to be identified in a preset time period;
extracting data corresponding to the multidimensional indexes under the electricity stealing client index system from the electricity utilization data;
data corresponding to the multidimensional indexes under the electricity stealing client index system are cleaned, and a Lagrange interpolation method is adopted to perform interpolation processing on the cleaned missing values;
and normalizing the data corresponding to the multidimensional indexes under the electricity stealing client index system after interpolation processing, and mapping the data between [ -1, 1] to obtain the multidimensional index value.
And performing interpolation processing on the missing value by adopting a Lagrange interpolation method. The specific method comprises the following steps:
firstly, determining dependent variables and independent variables from original data; secondly, 5 data before and after the missing value are taken out (the data in the data before and after are absent or empty, the data are directly omitted, and only data are combined into a group); then, a group is formed from the 10 pieces of data taken out. The lagrange formula is as follows:
Figure BDA0003401379390000081
Figure BDA0003401379390000082
finally, interpolation is realized through lagrange (y.index, list (y)) function and the interpolation result is returned.
S102: inputting multidimensional index values into a pre-constructed electricity stealing client identification model to obtain an identification result of a client to be identified, wherein the electricity stealing client identification model is a model with the best identification effect on the electricity stealing client in the identification model based on the LM neural network and the identification model based on the CART decision tree, the identification model based on the LM neural network and the identification model based on the CART decision tree are obtained by respectively training the LM neural network model and the CART decision tree model by utilizing a training sample which is marked as the electricity stealing client or not, and the training sample comprises the multidimensional index values under an electricity stealing client index system.
The identification model based on the LM neural network is the same as the training sample of the identification model based on the CART decision tree, and the training sample comprises multi-dimensional index values under a power stealing client index system.
The LM algorithm is a fast algorithm using standard numerical optimization techniques. The LM algorithm is an improved form of the Gauss-Newton method, is a combination of gradient descent and the Gauss-Newton method, and has the local convergence of the Gauss-Newton method and the global characteristic of the gradient descent method. Because it uses approximate second derivative information, convergence speed is much faster than that of the gradient method, and the algorithm is stable.
In BP neural network, let xkRepresenting the vector formed by the weight and the threshold value of the k iteration, and the new vector x formed by the weight and the threshold valuek+1The following rule can be used to obtain:
xk+1=xkx. (1)
for newton's law:
Figure BDA0003401379390000083
wherein
Figure BDA0003401379390000084
The gradient is represented by the number of lines,
Figure BDA0003401379390000085
the Hessian matrix representing the error indicator function e (x).
Let the error index function be:
Figure BDA0003401379390000091
in the formula (3), ei(x) For an error (i ═ 1, 2.., N), then:
Figure BDA0003401379390000092
Figure BDA0003401379390000093
in the formulas (4) and (5)
Figure BDA0003401379390000094
Is a Jacobian matrix, namely:
Figure BDA0003401379390000095
the calculation method of the Gauss-Newton method is as follows:
Δx=-[JT(x)J(x)]-1J(x)e(x) (7)
the LM algorithm is an improvement of it, namely:
Δx=-[JT(x)J(x)+μI]-1J(x)e(x) (8)
in the formula (8), the proportionality coefficient μ >0 is constant, and I is an identity matrix. When mu is 0, the Gauss-Newton method is obtained; when the value of mu is large, the gradient descent method is approached.
Practice proves that the speed can be improved by dozens of times or even hundreds of times by adopting the LM algorithm compared with the original gradient descent method. Also because of [ JT(x)J(x)+μI]-1Is positive, the solution of equation (7) always exists, so the LM algorithm is also better than the Gauss-Newton method.
The method for constructing the identification model based on the LM neural network comprises the following steps:
according to the Ockham principle of neural networks, in network modeling, a larger network is not used as long as a smaller network can work. The neural network with the least number of layers of 3 is used, namely an input layer, a hidden layer and an output layer, wherein the hidden layer and the output layer adopt an activation function relu.
The number of neurons in the input layer and the output layer is determined by the types of input and output elements in the training set respectively, and the network takes 49 main influence factors of a user as input and takes a client which is not a power stealing client as output. The number of neurons in the input and output layers is thus 49 and 1, respectively.
The number of hidden neurons is usually determined by empirical formula
Figure BDA0003401379390000096
Determining, wherein: n1 is the number of cryptic neurons; n is the number of neurons in the input layer; m is the number of neurons in the output layer; a is a constant between 1 and 10. And changing the number of hidden neurons from small to large according to the calculated value of n1, and selecting 10 hidden neurons under the condition that the number of neurons does not influence network errors and the relative training data of the network is not large enough to generate over training.
And performing model training on the training set based on the constructed model, and after the network performs multiple convergence training by adopting different initial training parameters, selecting the optimal convergence standard by adopting a weighted average value of the precision of the network training set and the precision of the test set, wherein the distribution of the weight can be set according to the quantity of the training set data and the test set data. And selecting the optimal training result from the training results, and obtaining the network parameters from the optimal training results to obtain the optimal result of model training.
The calculation steps of the LM neural network are as follows:
(1) giving out the allowable value epsilon, the coefficients beta and mu of the model training error0And initializing the weight and threshold vector x0Let k be 0 and μ be μ0
(2) Computing the network output and error index function E (x)k)。
(3) The Jacobian matrix J (x) is calculated according to equation (6).
(4) Respectively calculating delta x and E (x) according to the formula (8) and the formula (3)k)。
(5) If E (x)k)<If epsilon, stopping; otherwise, with xk+1Calculating error index function E (x) for weight and thresholdk+1)。
(6) If E (x)k+1)<E(xk) If k is k +1 and μ is μ/β, the process returns to step (2); otherwise, the weight and the threshold are not updated, let xk+1=xkμ ═ μ β, and go back to step (4).
The CART decision tree algorithm is a supervised learning algorithm, and the key point of the CART decision tree algorithm is to select the partition attribute of each node from available attributes so as to ensure that the CART decision tree is dividedThe highest classification precision is achieved. The tree growing process is a process of continuously dividing a data set, and for each division, the "difference" between data records divided into the same branch is minimum (namely, the data records belong to the same class), the "difference" between data records divided into different branches is maximum (namely, the data records belong to different classes), and an index for measuring the "difference" is called as the impurity. The CART decision tree uses "kini index" to select partition attributes, and the purity of a data set can be measured in kini values. For a given sample set D, assume that there are K classes, the number of kth classes being CkThen the expression for the kini coefficient of sample D is:
Figure BDA0003401379390000101
in particular, for sample D, if D is divided into D according to a certain value a of the characteristic A1And D2And two parts, under the condition of the characteristic A, the Gini index expression of D is as follows:
Figure BDA0003401379390000111
gini (D) indicates the impurity of the set D, and Gini (D, A) indicates the impurity of the set D after being divided by A ═ a. The greater the kini index, the greater the impurity of the sample set.
Before prediction of the CART decision tree, a training sample set is provided to construct and evaluate the CART, and then the CART decision tree can be used. The CART decision tree uses a training sample set as follows:
L={X1,X2,…,Xm,Y}
X1=(x11,x12,…,x1t1),…,Xm=(xm1,xm2,…,xmin)
Y=(Y1,Y2,…,Yk)
wherein, X1,X2,…,XmCalled attribute directionQuantities, whose attributes may be ordered or discrete; y is called a tag vector whose attributes may be ordered or discrete.
The CART decision tree is characterized in that one attribute or a combination of a plurality of attributes is selected from a plurality of prediction attributes to serve as a dividing attribute of a tree node, a test variable is divided into branches, and the process is repeated to establish a sufficiently large classification tree.
The CART decision tree stops building trees when one of the following conditions is met:
1. the number of samples in all leaf nodes is 1 or the samples belong to the same class;
2. the decision tree height reaches a threshold set by the user.
After the decision tree is generated, the generated decision tree needs to be pruned by using a pruning algorithm, an optimal sub-tree is selected, and firstly, the decision tree T generated by the generation algorithm0The bottom end starts to continuously prune until T0Form a subtree sequence { T }0,T1T2,T3,…,Tn},T0Is not cut, T1One leaf node is cut, and so on. And then testing the sub-tree sequences on an independent verification data set through a cross verification method, and selecting an optimal sub-tree from the sub-tree sequences as a final CART decision tree-based recognition model.
Inputting: and (4) a decision tree generated by the CART algorithm.
And (3) outputting: optimal decision tree Tα
The method comprises the following specific steps:
(1) let k equal to 0 and T equal to T0
(2) Let α be + ∞.
(3) Bottom-up computation of internal node T C (T)t) And | Tt|
Figure BDA0003401379390000112
Wherein, TtRepresenting a subtree with T as root node, C (T)t) Is to the number of trainingAccording to the prediction error, | TtIs T |tThe number of leaf nodes.
(4) Accessing the internal node T from top to bottom, if there is an internal node with g (T) ═ alpha, pruning, and determining the class of the leaf node T by majority voting method to obtain the tree T.
(5) Let k be k +1, αk=α,Tk=T。
(6) If T is not a tree consisting of root nodes alone, go back to step 4.
Verifying on the sub-tree sequence by adopting a cross verification method, and finally selecting the optimal sub-tree TαAnd the final identification model based on the CART decision tree is obtained.
After an identification model based on an LM (Linear modeling) neural network and an identification model based on a CART (Carrier-oriented training) decision tree are constructed, model effect evaluation needs to be carried out on the two models, so that the identification model based on the LM neural network and the identification model based on the CART decision tree are selected as electricity stealing client identification models, and two methods of accuracy and ROC (rock characteristic) curves are mainly adopted for the model effect evaluation in the embodiment.
In the first method, in the effect evaluation process of an identification model based on an LM (mean square) neural network and an identification model based on a CART (carry out robust tree), a confusion matrix is combined, the accuracy is respectively calculated by the accuracy (accuracy) of all samples/total samples with correct prediction, and then the models with higher accuracy are selected for application by performing preliminary comparison.
And drawing an ROC (receiver operating characteristic curve) curve of the recognition model based on the LM neural network and the recognition model based on the CART decision tree, comparing AUC (area Under curve) Under the two ROC curves, defining the AUC as an area surrounded by ROC curves and coordinate axes, and using the area as an index for measuring the advantages and disadvantages of different models, wherein the larger the value is, the more the possibility of correct prediction of the models is, and selecting the model with the larger AUC value according to the AUC value to develop application.
The optimal electricity stealing client identification model is selected through model evaluation to carry out subsequent application, so that the identification and early warning of electricity stealing clients are realized, and the application is as follows:
the electricity stealing possibility of a user is comprehensively judged by combining an electricity stealing prevention monitoring system based on an electricity stealing client identification model, the electricity utilization characteristic analysis and intelligent pushing of the client are completed, and electricity stealing behaviors in the client with abnormal electricity utilization are accurately identified; by constructing a long-acting mechanism for abnormal power utilization investigation, marketing inspection work is assisted to be carried out, the accuracy of electricity stealing client identification is improved, a more efficient method and means are provided for electricity stealing prevention management, the electricity stealing prevention management level is improved, and the company operation risk is reduced.
Based on the above-mentioned electricity stealing client identification method disclosed in the embodiment, the embodiment correspondingly discloses an electricity stealing client identification device, please refer to fig. 2, the device includes:
the client data acquisition unit 201 to be identified is used for acquiring a multidimensional index value of a client to be identified under a pre-established electricity stealing client index system;
the electricity stealing client identification unit 202 is configured to input the multidimensional index value into a pre-constructed electricity stealing client identification model to obtain an identification result of the client to be identified, where the electricity stealing client identification model is a model with a best identification effect on the electricity stealing client in an identification model based on an LM neural network and an identification model based on a CART decision tree, the identification model based on the LM neural network and the identification model based on the CART decision tree are obtained by respectively training an LM neural network model and a CART decision tree model using a training sample labeled as whether the electricity stealing client is a electricity stealing client, and the training sample includes the multidimensional index value under the electricity stealing client index system.
Optionally, the to-be-identified client data obtaining unit 201 is specifically configured to:
acquiring power consumption data of the customer to be identified in a preset time period;
extracting data corresponding to the multidimensional indexes under the electricity stealing customer index system from the electricity utilization data;
data corresponding to the multidimensional indexes under the electricity stealing client index system are cleaned, and a Lagrange interpolation method is adopted to perform interpolation processing on the cleaned missing values;
and normalizing the data corresponding to the multidimensional indexes under the electricity stealing client index system after interpolation processing to obtain the multidimensional index values.
Optionally, the apparatus further includes a first model building unit, specifically configured to:
determining the number of neurons of an input layer in the LM neural network according to the number of multi-dimensional indexes in the electricity stealing client index system, and determining the number of neurons of an output layer in the LM neural network as 1;
carrying out multiple convergence training on the LM neural network by using the training samples by using different initial training parameters;
acquiring the precision of a training set and the precision of a test set after each convergence training, and respectively carrying out weighted average calculation on the precision of the training set and the precision of the test set after each convergence training according to the preset precision weight of the training set and the preset precision weight of the test set to obtain the precision value of each convergence training;
and determining the network parameter corresponding to the convergence training with the highest precision value as the optimal network parameter to obtain the identification model based on the LM neural network corresponding to the optimal network parameter.
Optionally, the apparatus further includes a second model building unit, specifically configured to:
selecting one or more multidimensional indexes from the electricity stealing client index system as the division attributes of tree nodes according to preset rules, taking each multidimensional index in the training sample as each branch of a test variable tree, repeating the process until one of preset conditions is met, stopping building the tree, and generating a CART decision tree;
pruning the generated CART decision tree by using a pruning algorithm to form a sub-tree sequence;
testing the sub-tree sequences on an independent verification data set through a cross verification method, and selecting an optimal sub-tree from the sub-tree sequences as the CART decision tree-based recognition model;
wherein the preset conditions include:
the number of samples in all leaf nodes of the CART decision tree is 1 or the samples belong to the same class;
the CART decision tree height reaches a threshold set by the user.
Optionally, the apparatus further comprises:
the recognition model evaluation unit is used for respectively counting the recognition accuracy of the recognition model based on the LM neural network and the recognition model based on the CART decision tree; and determining the model with the highest identification accuracy as the electricity stealing client identification model.
Optionally, the apparatus further comprises:
the identification model evaluation unit is used for respectively drawing ROC curves of the identification model based on the LM neural network and the identification model based on the CART decision tree; and determining the model with the maximum AUC under the ROC curve as the electricity stealing customer identification model.
The device for identifying the electricity stealing clients disclosed by the embodiment improves the comprehensiveness and the accuracy of data to be identified by constructing an electricity stealing client index system, takes a model with the best identification effect on the electricity stealing clients in an identification model based on an LM (linear least squares) neural network and an identification model based on a CART (carry out robust least squares) decision tree as an electricity stealing client identification model, and has the advantages that the LM neural network has the local convergence of a Gaussian-Newton method and the global characteristic of a gradient descent method, the CART decision tree is high in efficiency, the maximum calculation frequency of each prediction does not exceed the depth of the decision tree, and the decision tree does not have any hypothesis requirement on statistical distribution on input data, so that the electricity stealing clients are identified by using the optimal model selected from the two models, and the global optimal solution with high convergence speed and high precision can be obtained, and the identification efficiency of the electricity stealing clients is improved.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above embodiments can be combined arbitrarily, and the features described in the embodiments in the present specification can be replaced or combined with each other in the above description of the disclosed embodiments, so that those skilled in the art can implement or use the present application.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for identifying a power stealing client, comprising:
acquiring a multi-dimensional index value of a customer to be identified under a pre-constructed electricity stealing customer index system;
inputting the multidimensional index value into a pre-constructed electricity stealing client identification model to obtain an identification result of the client to be identified, wherein the electricity stealing client identification model is a model with the best identification effect on the electricity stealing client in an identification model based on an LM (Linear programming) neural network and an identification model based on a CART (Carrier induced reactor) decision tree, the identification model based on the LM neural network and the identification model based on the CART decision tree are obtained by respectively training an LM neural network model and a CART decision tree model by utilizing a training sample which is marked as an electricity stealing client or not, and the training sample comprises the multidimensional index value under an electricity stealing client index system.
2. The method of claim 1, wherein the obtaining of the multi-dimensional index value of the customer to be identified under the pre-constructed electricity stealing customer index system comprises:
acquiring power consumption data of the customer to be identified in a preset time period;
extracting data corresponding to the multidimensional indexes under the electricity stealing customer index system from the electricity utilization data;
data corresponding to the multidimensional indexes under the electricity stealing client index system are cleaned, and a Lagrange interpolation method is adopted to perform interpolation processing on the cleaned missing values;
and normalizing the data corresponding to the multidimensional indexes under the electricity stealing client index system after interpolation processing to obtain the multidimensional index values.
3. The method of claim 1, further comprising:
determining the number of neurons of an input layer in the LM neural network according to the number of multi-dimensional indexes in the electricity stealing client index system, and determining the number of neurons of an output layer in the LM neural network as 1;
carrying out multiple convergence training on the LM neural network by using the training samples by using different initial training parameters;
acquiring the precision of a training set and the precision of a test set after each convergence training, and respectively carrying out weighted average calculation on the precision of the training set and the precision of the test set after each convergence training according to the preset precision weight of the training set and the preset precision weight of the test set to obtain the precision value of each convergence training;
and determining the network parameter corresponding to the convergence training with the highest precision value as the optimal network parameter to obtain the identification model based on the LM neural network corresponding to the optimal network parameter.
4. The method of claim 1, further comprising:
selecting one or more multidimensional indexes from the electricity stealing client index system as the division attributes of tree nodes according to preset rules, taking each multidimensional index in the training sample as each branch of a test variable tree, repeating the process until one of preset conditions is met, stopping building the tree, and generating a CART decision tree;
pruning the generated CART decision tree by using a pruning algorithm to form a sub-tree sequence;
testing the sub-tree sequences on an independent verification data set through a cross verification method, and selecting an optimal sub-tree from the sub-tree sequences as the CART decision tree-based recognition model;
wherein the preset conditions include:
the number of samples in all leaf nodes of the CART decision tree is 1 or the samples belong to the same class;
the CART decision tree height reaches a threshold set by the user.
5. The method of claim 1, further comprising:
respectively counting the recognition accuracy of the recognition model based on the LM neural network and the recognition model based on the CART decision tree;
and determining the model with the highest identification accuracy as the electricity stealing client identification model.
6. The method of claim 1, further comprising:
respectively drawing ROC curves of the identification model based on the LM neural network and the identification model based on the CART decision tree;
and determining the model with the maximum AUC under the ROC curve as the electricity stealing customer identification model.
7. An electricity stealing client identifying apparatus, comprising:
the system comprises a to-be-identified client data acquisition unit, a power stealing client index system acquisition unit and a power stealing client index system acquisition unit, wherein the to-be-identified client data acquisition unit is used for acquiring a multi-dimensional index value of a to-be-identified client under a pre-constructed power stealing client index system;
the electricity stealing client identification unit is used for inputting the multi-dimensional index values into a pre-constructed electricity stealing client identification model to obtain the identification result of the client to be identified, wherein the electricity stealing client identification model is the model with the best identification effect on the electricity stealing client in the identification model based on the LM neural network and the identification model based on the CART decision tree, the identification model based on the LM neural network and the identification model based on the CART decision tree are obtained by respectively training the LM neural network model and the CART decision tree model by utilizing training samples which are marked as electricity stealing clients or not, and the training samples comprise the multi-dimensional index values under the electricity stealing client index system.
8. The apparatus according to claim 7, wherein the client data acquiring unit to be identified is specifically configured to:
acquiring power consumption data of the customer to be identified in a preset time period;
extracting data corresponding to the multidimensional indexes under the electricity stealing customer index system from the electricity utilization data;
data corresponding to the multidimensional indexes under the electricity stealing client index system are cleaned, and a Lagrange interpolation method is adopted to perform interpolation processing on the cleaned missing values;
and normalizing the data corresponding to the multidimensional indexes under the electricity stealing client index system after interpolation processing to obtain the multidimensional index values.
9. The apparatus according to claim 7, characterized in that the apparatus further comprises a first model construction unit, in particular for:
determining the number of neurons of an input layer in the LM neural network according to the number of multi-dimensional indexes in the electricity stealing client index system, and determining the number of neurons of an output layer in the LM neural network as 1;
carrying out multiple convergence training on the LM neural network by using the training samples by using different initial training parameters;
acquiring the precision of a training set and the precision of a test set after each convergence training, and respectively carrying out weighted average calculation on the precision of the training set and the precision of the test set after each convergence training according to the preset precision weight of the training set and the preset precision weight of the test set to obtain the precision value of each convergence training;
and determining the network parameter corresponding to the convergence training with the highest precision value as the optimal network parameter to obtain the identification model based on the LM neural network corresponding to the optimal network parameter.
10. The apparatus according to claim 7, characterized in that the apparatus further comprises a second model construction unit, in particular for:
selecting one or more multidimensional indexes from the electricity stealing client index system as the division attributes of tree nodes according to preset rules, taking each multidimensional index in the training sample as each branch of a test variable tree, repeating the process until one of preset conditions is met, stopping building the tree, and generating a CART decision tree;
pruning the generated CART decision tree by using a pruning algorithm to form a sub-tree sequence;
testing the sub-tree sequences on an independent verification data set through a cross verification method, and selecting an optimal sub-tree from the sub-tree sequences as the CART decision tree-based recognition model;
wherein the preset conditions include:
the number of samples in all leaf nodes of the CART decision tree is 1 or the samples belong to the same class;
the CART decision tree height reaches a threshold set by the user.
CN202111500444.8A 2021-12-09 2021-12-09 Method and device for identifying electricity stealing clients Pending CN114186844A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111500444.8A CN114186844A (en) 2021-12-09 2021-12-09 Method and device for identifying electricity stealing clients

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111500444.8A CN114186844A (en) 2021-12-09 2021-12-09 Method and device for identifying electricity stealing clients

Publications (1)

Publication Number Publication Date
CN114186844A true CN114186844A (en) 2022-03-15

Family

ID=80604048

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111500444.8A Pending CN114186844A (en) 2021-12-09 2021-12-09 Method and device for identifying electricity stealing clients

Country Status (1)

Country Link
CN (1) CN114186844A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115907236A (en) * 2023-02-17 2023-04-04 西南石油大学 Underground complex condition prediction method based on improved decision tree

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115907236A (en) * 2023-02-17 2023-04-04 西南石油大学 Underground complex condition prediction method based on improved decision tree
CN115907236B (en) * 2023-02-17 2023-11-03 西南石油大学 Underground complex situation prediction method based on improved decision tree

Similar Documents

Publication Publication Date Title
CN107506868B (en) Method and device for predicting short-time power load
CN112712209B (en) Reservoir warehousing flow prediction method and device, computer equipment and storage medium
CN113126019B (en) Remote estimation method, system, terminal and storage medium for error of intelligent ammeter
CN114519514B (en) Low-voltage transformer area reasonable line loss value measuring and calculating method, system and computer equipment
CN111860624A (en) Power grid fault information classification method based on decision tree
CN112241836B (en) Virtual load leading parameter identification method based on incremental learning
CN115015683B (en) Cable production performance test method, device, equipment and storage medium
CN115409120A (en) Data-driven-based auxiliary user electricity stealing behavior detection method
CN115358437A (en) Power supply load prediction method based on convolutional neural network
CN112381673A (en) Park electricity utilization information analysis method and device based on digital twin
CN114186844A (en) Method and device for identifying electricity stealing clients
CN115293257A (en) Detection method and system for abnormal electricity utilization user
CN116186630A (en) Abnormal leakage current data identification method and related device
CN111027841A (en) Low-voltage transformer area line loss calculation method based on gradient lifting decision tree
CN113538063A (en) Electricity charge abnormal data analysis method, device, equipment and medium based on decision tree
CN112595918A (en) Low-voltage meter reading fault detection method and device
CN111612149A (en) Main network line state detection method, system and medium based on decision tree
Pan et al. Study on intelligent anti–electricity stealing early-warning technology based on convolutional neural networks
CN114818849A (en) Convolution neural network based on big data information and anti-electricity-stealing method based on genetic algorithm
CN115147242A (en) Power grid data management system based on data mining
CN114282657A (en) Market data long-term prediction model training method, device, equipment and storage medium
CN113962508A (en) Identification method and identification device for electricity object and electronic equipment
CN109871998B (en) Power distribution network line loss rate prediction method and device based on expert sample library
CN113435915B (en) Method, device, equipment and storage medium for detecting electricity stealing behavior of user
CN110263811A (en) A kind of equipment running status monitoring method and system based on data fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination