CN112101471A - Electricity stealing probability early warning analysis method - Google Patents
Electricity stealing probability early warning analysis method Download PDFInfo
- Publication number
- CN112101471A CN112101471A CN202010992846.3A CN202010992846A CN112101471A CN 112101471 A CN112101471 A CN 112101471A CN 202010992846 A CN202010992846 A CN 202010992846A CN 112101471 A CN112101471 A CN 112101471A
- Authority
- CN
- China
- Prior art keywords
- electricity
- data
- stealing
- analysis
- electricity stealing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000005611 electricity Effects 0.000 title claims abstract description 114
- 238000004458 analytical method Methods 0.000 title claims abstract description 23
- 238000000034 method Methods 0.000 claims abstract description 28
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 25
- 238000007477 logistic regression Methods 0.000 claims abstract description 19
- 238000007621 cluster analysis Methods 0.000 claims abstract description 12
- 238000013145 classification model Methods 0.000 claims abstract description 8
- 230000006399 behavior Effects 0.000 claims description 62
- 230000002159 abnormal effect Effects 0.000 claims description 24
- 238000007781 pre-processing Methods 0.000 claims description 10
- 230000003068 static effect Effects 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 6
- 238000003657 Likelihood-ratio test Methods 0.000 claims description 5
- FGUUSXIOTUKUDN-IBGZPJMESA-N C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 Chemical compound C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 FGUUSXIOTUKUDN-IBGZPJMESA-N 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 8
- 238000005516 engineering process Methods 0.000 abstract description 7
- 230000002265 prevention Effects 0.000 abstract description 5
- 238000013024 troubleshooting Methods 0.000 abstract description 5
- 230000000694 effects Effects 0.000 abstract description 4
- 230000007547 defect Effects 0.000 abstract description 3
- 238000003745 diagnosis Methods 0.000 abstract description 3
- 238000007405 data analysis Methods 0.000 abstract description 2
- 238000013461 design Methods 0.000 abstract description 2
- 230000006872 improvement Effects 0.000 abstract description 2
- 230000009897 systematic effect Effects 0.000 abstract description 2
- 238000004141 dimensional analysis Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 17
- 230000001419 dependent effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000002354 daily effect Effects 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Health & Medical Sciences (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Development Economics (AREA)
- Probability & Statistics with Applications (AREA)
- Game Theory and Decision Science (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a power stealing probability early warning analysis method, which comprises the steps of firstly establishing a customer power consumption behavior abnormity classification model by adopting a logistic regression analysis algorithm, and then establishing a customer power consumption behavior abnormity discrimination model by adopting a clustering analysis algorithm. The method fully applies logistic regression analysis technology and K-Means cluster analysis technology to calculate the electricity consumption behavior data of the user, realizes the on-line diagnosis of the field electricity stealing behavior, improves the work efficiency of electricity stealing troubleshooting, and reduces the work cost; the method comprises the steps of establishing a client electricity stealing probability big data analysis model, carrying out multi-dimensional analysis on all electricity consumers, accurately identifying suspected electricity stealing users, establishing systematic and normalized electricity anti-stealing analysis, early warning, troubleshooting and closed-loop service processes, and improving the work effect of electricity anti-stealing; based on the refined analysis result of the electricity stealing mode, the improvement of the design defect of the metering device and the upgrading of the electricity stealing prevention function are promoted.
Description
Technical Field
The invention relates to the field of big data, in particular to an electricity stealing probability early warning analysis method.
Background
In order to quickly and accurately locate suspected users of 'default electricity utilization and electricity stealing', various electricity stealing factors are comprehensively considered on the basis of a large amount of customer electricity utilization information accumulated by an electricity utilization information acquisition system and a marketing service application system, a customer electricity stealing probability analysis model is established, the whole process management of on-site electricity stealing behavior on-line diagnosis and electricity stealing behavior analysis is realized through a big data technology analysis means, electricity stealing prevention services are flexibly developed, and the economic loss of a power grid is recovered. The customer electricity consumption behavior information can be divided into two categories of static information data and dynamic information data, wherein the static information data mainly comprise basic customer information, such as a house name, a customer region, an industry classification, electricity consumption capacity, an electricity consumption address, arrearage information, default records and the like; the dynamic information data mainly comprises acquisition information and metering statistics information, and the acquisition information mainly comprises table codes, voltage, current, phase angles and the like; the metering statistical information mainly comprises line loss, electric energy, average power utilization conditions of various industries and the like. The forms of electricity stealing, although varied, can be broadly divided into 2 ways: the electricity stealing mode of the hardware of the electric energy meter is changed and the high-tech electricity stealing means of the hardware of the electric energy meter is not changed. The former mostly generates abnormal acquisition data and can carry out feature matching based on various index data; the latter generally adopts data normality, and can only distinguish anomalies through data trends.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an electricity stealing probability early warning analysis method, which solves the problems of slow work complexity progress, low result accuracy and low reliability caused by manual screening and multiple on-site operation in the prior art.
The technical scheme adopted by the invention for realizing the purpose is as follows:
a power stealing probability early warning analysis method comprises the steps of firstly adopting a logistic regression analysis algorithm to establish a customer power consumption behavior abnormity classification model, and then adopting a clustering analysis algorithm to establish a customer power consumption behavior abnormity discrimination model.
The method for establishing the abnormal classification model of the electricity consumption behavior of the client by adopting the logistic regression analysis algorithm comprises the following steps:
step 1: acquiring typical electricity stealing case data and normal electricity using behavior data in the same proportion;
step 2: preprocessing typical electricity stealing case data and normal electricity utilization behavior data in the same proportion through a database;
and step 3: carrying out descriptive statistics on the multi-dimensional characteristics of the abnormal degree of the client;
and 4, step 4: using spss to make logistic regression analysis and setting 50% as prediction result threshold value, setting forward stepping likelihood ratio test method, selecting optimum independent variable and simultaneously outputting regression coefficient value beta of each variablei;
And 5: and substituting the model training result into a prediction function.
The typical electricity stealing case data acquisition comprises the step of acquiring the related data information of the illegal electricity utilization and stealing of the customers in the marketing business application system, including electricity stealing case information, illegal electricity utilization and stealing information, on-site investigation evidence obtaining information and inspection result information.
And the acquisition of the normal electricity consumption behavior data of the same proportion comprises the acquisition of the normal electricity consumption behavior data of the same proportion in marketing service application.
The multi-dimensional characteristics of the customer abnormal degree comprise: whether current three-phase imbalance occurs, whether stopping of the electric energy meter and abnormal electric quantity fluctuation occur, and whether abnormal cover opening recording occurs.
The preprocessing of the typical electricity stealing case data and the normal electricity using behavior data with the same proportion comprises multi-table data merging, invalid value deletion, null value filling, and then marking whether electricity stealing is carried out, wherein the electricity stealing mark is 1, and otherwise, the electricity stealing mark is 0.
The method for establishing the customer electricity consumption behavior abnormity discrimination model by adopting the cluster analysis algorithm comprises the following steps:
step a: acquiring historical electricity consumption behavior data and user static data of a client in a marketing service application system;
step b: preprocessing the historical electricity consumption behavior data of the client and the static data of the user through a database;
step c: carrying out normalization processing on the electric quantity, voltage, current, power and load data of a client, and dividing the electric quantity, the voltage, the current, the power and the load data into different types according to areas and power utilization types;
step d: adopting a K-Means cluster analysis algorithm for different kinds of data, selecting a cluster number K value, and judging whether the model is converged; if yes, outputting a clustering result, and executing the step f;
step e: if not, adjusting the model parameters and returning to the step d;
step f: and generating typical electricity utilization behavior curves of various types of users according to the clustering structure.
The preprocessing of the historical electricity consumption behavior data and the user static data of the client comprises the following steps: invalid values are deleted and null values are filled.
The invention has the following beneficial effects and advantages:
the method fully applies logistic regression analysis technology and K-Means cluster analysis technology to calculate the electricity consumption behavior data of the user, realizes the on-line diagnosis of the field electricity stealing behavior, improves the work efficiency of electricity stealing troubleshooting, and reduces the work cost;
according to the method, a data analysis model with high power stealing probability is set up for customers, multidimensional analysis is carried out on all power customers, suspected power stealing users are accurately identified, systematic and normalized power stealing prevention analysis, early warning, troubleshooting and closed-loop service processes are established, and the power stealing prevention work effect is improved;
the invention refines the analysis result based on the electricity stealing mode, and promotes the improvement of the design defect of the metering device and the upgrading of the electricity stealing prevention function.
Drawings
FIG. 1 is a flow chart of the logistic regression analysis algorithm for establishing abnormal classification of customer electricity consumption behavior according to the present invention;
FIG. 2 is a flow chart of the cluster analysis algorithm for determining abnormal electricity consumption behavior of a customer according to the present invention;
FIG. 3 is a functional graph of a dependent variable sigmoid growth curve in the logistic regression analysis algorithm of the present invention;
FIG. 4 is a plot of customer load daily average case data for the present invention;
fig. 5 is a graph of the power usage behavior of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, embodiments accompanying the drawings are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, but rather should be construed as modified in the spirit and scope of the present invention as set forth in the appended claims.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The method comprises the following steps:
fig. 1 is a flow chart of the logistic regression analysis algorithm for establishing abnormal classification of customer electricity consumption behavior according to the present invention.
1. A logistic regression analysis algorithm is adopted to establish a classification model of abnormal electricity consumption behaviors of customers, and the specific decomposition process is as follows:
(1) data acquisition: the method comprises the steps of obtaining typical electricity stealing case data and normal electricity using behavior data with the same proportion. Firstly, relevant data information of the illegal electricity utilization and electricity stealing of customers in a marketing business application system comprises electricity stealing case information, illegal electricity utilization and electricity stealing information, on-site investigation evidence obtaining information, inspection result information and the like; secondly, typical electricity stealing case information of different types is gathered from companies in cities and counties; thirdly, acquiring normal electricity consumption behavior data with the same proportion in marketing service application;
(2) preprocessing data through an Oracle database, including multi-table data combination, deleting invalid values, filling null values, and then marking whether electricity stealing is performed, wherein the electricity stealing mark is 1, and otherwise, the electricity stealing mark is 0;
(3) and according to the collected typical electricity stealing cases, according to different electricity stealing types, carrying out descriptive statistics on the multidimensional characteristics of the abnormal degree of the customer. The method mainly comprises the following steps: performing cross statistics on key information such as whether current three-phase imbalance occurs, whether stopping of the electric energy meter and abnormal electric quantity fluctuation occur, whether abnormal uncapping recording occurs and the like;
(4) using spss to make logistic regression analysis and setting 50% as prediction result threshold value, setting forward stepping likelihood ratio test method, selecting optimum independent variable and simultaneously outputting regression coefficient value beta of each variablei;
(5) Substituting a prediction function according to the model training result:
z=β0+β1voltage open phase + beta2Differential anomaly of electrical quantity + beta3Abnormal fluctuation of electric quantity + beta4Stop + beta of electric energy meter5Differential power anomaly + beta6CT loop + beta7Loss of current + beta8Uncovering + beta of electric energy meter9Opening and closing + beta of metering gate10Interference of constant magnetic field
Since 50% is set as the prediction result threshold in advance, when the p value is greater than 50%, the power stealing is represented, otherwise, the normal user is represented.
(6) Potential characteristics in the behavior information data of the electricity stealing users are mined, and an electricity stealing user characteristic file is established and used for anti-electricity stealing early warning and troubleshooting.
Fig. 2 is a flow chart for judging whether the power consumption behavior of the client is abnormal, which is established by the cluster analysis algorithm of the present invention.
2. A clustering analysis algorithm is adopted to establish a customer electricity consumption behavior abnormity discrimination model, and the specific decomposition process is as follows:
(1) data acquisition: the data are from historical electricity consumption behavior data of clients and user static data in a marketing business application system;
(2) preprocessing the historical electricity consumption behavior data of the client through an ORACLE database, deleting invalid values, filling null values and the like;
(3) the method comprises the following steps of dividing customers into different types according to regions and electricity utilization types;
(4) carrying out normalization processing on data such as electric quantity, voltage, current, power, load and the like of a client;
(5) adopting a K-Means clustering analysis algorithm for different kinds of data, selecting a clustering number K value according to business general knowledge, and judging whether the model is converged;
(6) if the model converges, outputting and generating typical electricity consumption behavior curves of various types of users; otherwise, adjusting model parameters, judging whether the model is converged according to the objective function SSE, continuously adjusting the k value, and finally selecting the minimum primary SSE as a clustering result;
(7) respectively drawing typical electricity consumption behavior curves according to the clustering result;
(8) and comparing and analyzing the power consumption behavior curve of the client in the new data with the typical power consumption behavior curve, and locking the abnormal power consumption client with the power consumption behavior not in accordance with the typical power consumption behavior track corresponding to the profile type.
The two models described in the above case are as follows:
(1) and (3) adopting a logistic regression analysis technology to classify the abnormal electricity consumption behavior of the customers. The logistic regression is a classification model in machine learning, and is mainly used for regression analysis of dependent variables, and independent variables can be classified variables or continuous variables. He can select from a plurality of independent variables the independent variable that has an effect on the dependent variable and can give a predictive formula for prediction.
Since the dependent variable is a sigmoidal growth curve function in the logistic regression algorithm, as shown in fig. 3:
z=β0+β1x1+…+βkxk
from the above figure, it can be seen that there is a fast changing process in the middle segment of the sigmoid growth curve, which can be used for the problem of two classifications, i.e. the prediction result of the function is higher than the preset threshold, which is the type a, or else, the type B. The feature vectors and parameters are thus introduced to derive the following prediction functions:
βithe meaning of (a): certain risk factors, when exposure level varies, i.e. xi1 and xiA logarithmic value of some resulting odds ratio occurs compared to 0:
and (3) likelihood ratio test:
by comparing the variation of the log-likelihood functions of two models containing and not containing one or several observation factors to be examined, the statistic is G:
G=-2(lnLp-lnLk)
when the sample amount is large, G approximately obeys Chi with the degree of freedom as the number of factors to be detected2And (4) distribution.
Finally, the final model is trained by the linear regression loss function. And (3) bringing a large amount of typical electricity stealing case data into the model, randomly selecting normal users with the same proportion, mining potential characteristics in the customer electricity consumption behavior information data, and establishing a customer electricity consumption behavior abnormity classification model.
(2) And (3) adopting a clustering analysis technology to judge the abnormal electricity consumption behavior of the customers. The cluster analysis is a multivariate statistical analysis method for classifying samples or indexes, and the discussed objects are a large number of samples, and the samples can be reasonably classified according to respective characteristics without prior knowledge. The clustering principle is that data in the same cluster has higher similarity, but data in different clusters do not have similarity. The partitioning method gives a data set containing n objects or data lines, and k objects are arbitrarily selected from the data set as initial clustering centers, and the rest other objects are respectively distributed according to the distances between the objects and the clustering centers. Then, the cluster center of each obtained new cluster is calculated, and iteration is repeated until the objective function SSE starts to converge. The method generally adopts a mean square error function as a measure function, generates typical electricity consumption behavior curves of various types of users by adopting a K-Means algorithm, and judges whether electricity consumption behaviors are abnormal or not by comparing and analyzing the electricity consumption behavior curves of the clients in new data and the typical electricity consumption behavior curves.
(3) The K-Means calculation method is as follows:
1. randomly selecting k central points;
2. traversing all the data, and dividing each data into the nearest central points;
3. calculating the average value of each cluster and taking the average value as a new central point;
4. repeat 2-3 until the k centerline points no longer change (converge), or a sufficient number of iterations are performed.
(4) And (3) convergence of the algorithm:
from the K-Means algorithm, SSE is actually a strict coordinate descent process. Let the objective function SSE be as follows:
SSE(C1,C2,…,Ck)=∑(X-Ci)2
the euclidean distance is used as a clustering function between variables. One variable C at a timeiFinding the optimal solution, i.e. calculating the inverse partial number, then equaling 0, can be obtained
Wherein m isiIs CiThe number of elements of the cluster in which it is located.
I.e. the mean of the current cluster is the optimal solution (minimum) for the current direction, as per each iteration of K-Means. This therefore ensures that the SSE is reduced for each iteration, eventually causing the SSE to converge.
Since the SSE is a non-convex function, the SSE cannot guarantee finding a globally optimal solution, but only a locally optimal solution. But may be repeated several times, and the smallest SSE is selected as the final clustering result.
(5)0-1 normalization:
due to the different dimensions between the data, the comparison is inconvenient. Therefore, the data needs to be uniformly put in the range of 0-1 and converted into dimensionless pure numerical values, so that indexes of different units or orders of magnitude can be compared and weighted conveniently. The specific calculation method is as follows:
(6) selecting a K value:
in practical applications, K-Means is generally used as a data preprocessing or for assisting classification labeling. K is generally not set large. By enumeration, K is from 2 to a fixed value such as 10, K-Means is repeatedly run for several times on each K value (to avoid a local optimal solution), the average contour coefficient of the current K is calculated, and finally K corresponding to the value with the maximum contour coefficient is selected as the final cluster number.
Example (b):
typical electricity stealing case data are listed and brought into a logistic regression model for calculation, a potential characteristic curve is obtained, and an electricity stealing characteristic file is established.
Specific data preparation:
the following table 1 respectively obtains typical electricity stealing case data and normal electricity consumption behavior data with the same proportion, simplifies key fields for verification convenience, and performs cross statistics, and the specific implementation process is as follows:
TABLE 1 Electricity stealing case data
Step 1: modeling
Using spss to make logistic regression analysis and setting 50% as prediction result threshold value, setting forward stepping likelihood ratio test method, selecting optimum independent variable and simultaneously outputting regression coefficient value beta of each variablei;
The model output results are shown in table 2 below:
variables in the equations of Table 2
a. The variable input in step 1 is a power differential exception.
b. And (3) opening the cover of the electric energy meter as the input variable in the step (2).
c. The variable input in step 3 is current loss.
d. The variable input in step 4, voltage phase loss.
TABLE 3 variables not in the equation
Step 2: and substituting the model training result into a prediction function.
x-3.070 +1.195 voltage phase loss +2.381 power differential anomaly +1.990 current loss +3.035
Electric energy meter cover
And step 3: and (5) verifying the model.
And (3) predicting the sample according to the step (2), wherein the accuracy of the final model can reach 86%, and the fitting effect on electricity stealing users is particularly good and reaches 88%.
TABLE 4 prediction results tabulation
Secondly, enumerating 1000 customer historical load daily average data (load values are recorded once every 15 minutes at 96 points every day), establishing a K-Means cluster analysis model, substituting the K-Means cluster analysis model into a K-Means cluster analysis algorithm, judging whether the model converges, comparing curves, and obtaining customers with abnormal electricity consumption behaviors.
Specific data preparation: as shown in fig. 4
Step 1: and (5) establishing a model.
And (3) the experimental data are brought into the sps for training, a heuristic method is adopted for the k value according to the business general knowledge, the final iteration number is 3, and convergence is achieved because the clustering center is not changed or is slightly changed.
TABLE 6 iteration History
Step 2: and respectively drawing typical electricity consumption behavior curves according to the clustering results.
The resulting electricity usage behavior curve is shown in fig. 5.
And step 3: locking the exception client.
And comparing and analyzing the power consumption behavior curve of the client in the new data with the typical power consumption behavior curve, and locking the abnormal power consumption client with the power consumption behavior not in accordance with the typical power consumption behavior track corresponding to the profile type.
Claims (3)
1. The electricity stealing probability early warning analysis method is characterized by comprising the following steps: firstly, establishing a classification model of abnormal electricity consumption behaviors of customers by adopting a logistic regression analysis algorithm, and then establishing a discrimination model of the abnormal electricity consumption behaviors of the customers by adopting a clustering analysis algorithm;
the method for establishing the customer electricity consumption behavior abnormity discrimination model by adopting the cluster analysis algorithm comprises the following steps:
step a: acquiring historical electricity consumption behavior data and user static data of a client in a marketing service application system;
step b: preprocessing the historical electricity consumption behavior data of the client and the static data of the user through a database;
step c: carrying out normalization processing on the electric quantity, voltage, current, power and load data of a client, and dividing the electric quantity, the voltage, the current, the power and the load data into different types according to areas and power utilization types;
step d: adopting a K-Means cluster analysis algorithm for different kinds of data, selecting a cluster number K value, and judging whether the model is converged; if yes, outputting a clustering result, and executing the step f;
step e: if not, adjusting the model parameters and returning to the step d;
step f: and generating typical electricity utilization behavior curves of various types of users according to the clustering structure.
2. The electricity stealing probability early warning analysis method according to claim 1, characterized in that: the preprocessing of the historical electricity consumption behavior data and the user static data of the client comprises the following steps: invalid values are deleted and null values are filled.
3. The electricity stealing probability early warning analysis method according to claim 1, characterized in that: the method for establishing the abnormal classification model of the electricity consumption behavior of the client by adopting the logistic regression analysis algorithm comprises the following steps:
step 1: acquiring typical electricity stealing case data and normal electricity using behavior data in the same proportion;
step 2: preprocessing typical electricity stealing case data and normal electricity utilization behavior data in the same proportion through a database;
and step 3: carrying out descriptive statistics on the multi-dimensional characteristics of the abnormal degree of the client;
and 4, step 4: using spss to make logistic regression analysis and setting 50% as prediction result threshold value, setting forward stepping likelihood ratio test method, selecting optimum independent variable and simultaneously outputting regression coefficient value beta of each variablei;
And 5: and substituting the model training result into a prediction function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010992846.3A CN112101471A (en) | 2020-09-21 | 2020-09-21 | Electricity stealing probability early warning analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010992846.3A CN112101471A (en) | 2020-09-21 | 2020-09-21 | Electricity stealing probability early warning analysis method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112101471A true CN112101471A (en) | 2020-12-18 |
Family
ID=73760118
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010992846.3A Pending CN112101471A (en) | 2020-09-21 | 2020-09-21 | Electricity stealing probability early warning analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112101471A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107944617A (en) * | 2017-11-20 | 2018-04-20 | 国网福建省电力有限公司 | A kind of doubtful stealing theme influence factor weight optimization method that logic-based returns |
CN112132210A (en) * | 2020-09-21 | 2020-12-25 | 国网辽宁省电力有限公司电力科学研究院 | Electricity stealing probability early warning analysis method based on customer electricity consumption behavior |
CN113744081A (en) * | 2021-08-23 | 2021-12-03 | 国网青海省电力公司信息通信公司 | Electricity stealing behavior analysis method |
CN114841268A (en) * | 2022-05-06 | 2022-08-02 | 国网江苏省电力有限公司营销服务中心 | Abnormal power customer identification method based on Transformer and LSTM fusion algorithm |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145966A (en) * | 2017-04-12 | 2017-09-08 | 山大地纬软件股份有限公司 | Logic-based returns the analysis and early warning method of opposing electricity-stealing of probability analysis Optimized model |
CN109190916A (en) * | 2018-08-09 | 2019-01-11 | 国网浙江桐庐县供电有限公司 | Method of opposing electricity-stealing based on big data analysis |
CN110097297A (en) * | 2019-05-21 | 2019-08-06 | 国网湖南省电力有限公司 | A kind of various dimensions stealing situation Intellisense method, system, equipment and medium |
CN110223196A (en) * | 2019-06-04 | 2019-09-10 | 国网浙江省电力有限公司电力科学研究院 | Analysis method of opposing electricity-stealing based on typical industry feature database and sample database of opposing electricity-stealing |
-
2020
- 2020-09-21 CN CN202010992846.3A patent/CN112101471A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145966A (en) * | 2017-04-12 | 2017-09-08 | 山大地纬软件股份有限公司 | Logic-based returns the analysis and early warning method of opposing electricity-stealing of probability analysis Optimized model |
CN109190916A (en) * | 2018-08-09 | 2019-01-11 | 国网浙江桐庐县供电有限公司 | Method of opposing electricity-stealing based on big data analysis |
CN110097297A (en) * | 2019-05-21 | 2019-08-06 | 国网湖南省电力有限公司 | A kind of various dimensions stealing situation Intellisense method, system, equipment and medium |
CN110223196A (en) * | 2019-06-04 | 2019-09-10 | 国网浙江省电力有限公司电力科学研究院 | Analysis method of opposing electricity-stealing based on typical industry feature database and sample database of opposing electricity-stealing |
Non-Patent Citations (5)
Title |
---|
刘卫新;尹文庆;潘霞;杨金成;: "聚类k-means算法在新疆反窃电工作中的应用", 南昌大学学报(理科版), no. 05 * |
张德丰: "《TensorFlow深度学习从入门到进阶》", 31 May 2020, 北京:机械工业出版社, pages: 121 - 123 * |
杨成荣;李战江;史来银;: "信用评价方法的多维最优选择策略", 统计与决策, no. 21 * |
梁波;许峰;李文修;: "基于客户用电行为的窃电概率预警分析", 农村电工, no. 08, pages 1 * |
蒙黄林: "《应用统计学》", 28 February 2018, 中国海洋大学出版社, pages: 181 - 192 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107944617A (en) * | 2017-11-20 | 2018-04-20 | 国网福建省电力有限公司 | A kind of doubtful stealing theme influence factor weight optimization method that logic-based returns |
CN112132210A (en) * | 2020-09-21 | 2020-12-25 | 国网辽宁省电力有限公司电力科学研究院 | Electricity stealing probability early warning analysis method based on customer electricity consumption behavior |
CN113744081A (en) * | 2021-08-23 | 2021-12-03 | 国网青海省电力公司信息通信公司 | Electricity stealing behavior analysis method |
CN113744081B (en) * | 2021-08-23 | 2024-05-28 | 国网青海省电力公司信息通信公司 | Analysis method for electricity stealing behavior |
CN114841268A (en) * | 2022-05-06 | 2022-08-02 | 国网江苏省电力有限公司营销服务中心 | Abnormal power customer identification method based on Transformer and LSTM fusion algorithm |
CN114841268B (en) * | 2022-05-06 | 2023-04-18 | 国网江苏省电力有限公司营销服务中心 | Abnormal power customer identification method based on Transformer and LSTM fusion algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110223196B (en) | Anti-electricity-stealing analysis method based on typical industry feature library and anti-electricity-stealing sample library | |
Buzau et al. | Hybrid deep neural networks for detection of non-technical losses in electricity smart meters | |
CN110097297B (en) | Multi-dimensional electricity stealing situation intelligent sensing method, system, equipment and medium | |
CN112101471A (en) | Electricity stealing probability early warning analysis method | |
CN112132210A (en) | Electricity stealing probability early warning analysis method based on customer electricity consumption behavior | |
CN111382542B (en) | Highway electromechanical device life prediction system facing full life cycle | |
Hachicha et al. | A survey of control-chart pattern-recognition literature (1991–2010) based on a new conceptual classification scheme | |
CN112084237A (en) | Power system abnormity prediction method based on machine learning and big data analysis | |
CN108764584A (en) | A kind of enterprise electrical energy replacement potential evaluation method | |
CN112084229A (en) | Method and device for identifying abnormal gas consumption behaviors of town gas users | |
CN112966259B (en) | Operation and maintenance behavior security threat assessment method and equipment for power monitoring system | |
CN117273489A (en) | Photovoltaic state evaluation method and device | |
CN115730962A (en) | Big data-based electric power marketing inspection analysis system and method | |
Li et al. | Distance measures in building informatics: An in-depth assessment through typical tasks in building energy management | |
Long et al. | A data-driven combined algorithm for abnormal power loss detection in the distribution network | |
CN115718861A (en) | Method and system for classifying power users and monitoring abnormal behaviors in high-energy-consumption industry | |
CN115409120A (en) | Data-driven-based auxiliary user electricity stealing behavior detection method | |
Jianyuan et al. | Anomaly electricity detection method based on entropy weight method and isolated forest algorithm | |
CN117251814A (en) | Method for analyzing electric quantity loss abnormality of highway charging pile | |
CN112633528A (en) | Power grid primary equipment operation and maintenance cost determination method based on support vector machine | |
CN111861785A (en) | Special transformer industry fault identification method based on power utilization characteristics and outlier detection | |
CN117060353A (en) | Fault diagnosis method and system for high-voltage direct-current transmission system based on feedforward neural network | |
Aquize et al. | Self-organizing maps for anomaly detection in fuel consumption. Case study: Illegal fuel storage in Bolivia | |
CN115147242A (en) | Power grid data management system based on data mining | |
CN111461565A (en) | Power supply side power generation performance evaluation method under power regulation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |