CN110503570A - A kind of exception electricity consumption data detection method, system, equipment, storage medium - Google Patents

A kind of exception electricity consumption data detection method, system, equipment, storage medium Download PDF

Info

Publication number
CN110503570A
CN110503570A CN201910641996.7A CN201910641996A CN110503570A CN 110503570 A CN110503570 A CN 110503570A CN 201910641996 A CN201910641996 A CN 201910641996A CN 110503570 A CN110503570 A CN 110503570A
Authority
CN
China
Prior art keywords
data
load
abnormal
line loss
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910641996.7A
Other languages
Chinese (zh)
Inventor
刘恬语
张涛
刘松梅
王桢干
刘伟
徐蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Industrial Co Ltd Of Strand Intense Source
Binhai County Power Supply Branch Of State Grid Jiangsu Electric Power Co Ltd
State Grid Corp of China SGCC
State Grid Jiangsu Electric Power Co Ltd
Yancheng Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Electric Industrial Co Ltd Of Strand Intense Source
Binhai County Power Supply Branch Of State Grid Jiangsu Electric Power Co Ltd
State Grid Corp of China SGCC
State Grid Jiangsu Electric Power Co Ltd
Yancheng Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Industrial Co Ltd Of Strand Intense Source, Binhai County Power Supply Branch Of State Grid Jiangsu Electric Power Co Ltd, State Grid Corp of China SGCC, State Grid Jiangsu Electric Power Co Ltd, Yancheng Power Supply Co of State Grid Jiangsu Electric Power Co Ltd filed Critical Electric Industrial Co Ltd Of Strand Intense Source
Priority to CN201910641996.7A priority Critical patent/CN110503570A/en
Publication of CN110503570A publication Critical patent/CN110503570A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Water Supply & Treatment (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a kind of abnormal electricity consumption data detection method, comprising steps of data are obtained, data cleansing, Data Dimensionality Reduction, model foundation, screening abnormal user.The invention further relates to a kind of abnormal electricity consumption data detection system, electronic equipment and storage mediums.The problem of effective solution of the present invention Controlling line loss, data mining research and analysis can be carried out extremely for the line loss of platform area electricity system, keep Controlling line loss more transparent, efficient, its integrated management application can be played, finally realizes the target of saving energy and decreasing loss, standardized management.

Description

Abnormal electricity consumption data detection method, system, equipment and storage medium
Technical Field
The invention relates to the technical field of power consumption information acquisition, in particular to an abnormal power consumption data detection method.
Background
With the rapid development of the information age, the internet and information communication industries are the first to develop the relevant research of big data. For the power industry, big data also has profound research significance and bright application prospect. As next generation power systems evolve, data-driven based power supply chains will gradually replace traditional power supply chains. The popularization of the power utilization information acquisition system provides a necessary data basis for the management operation decision and power supply service optimization based on power data analysis in the power industry of China. Meanwhile, as the power consumption data such as electric energy data, working condition data, event information and the like exponentially increase, the characteristics of the big data become more and more significant, and the application requirement of the power consumption big data is increasingly urgent. The massive power utilization data mainly come from various metering devices and systems, and due to various equipment faults, communication faults, power grid fluctuation, management and the like, a large amount of abnormal power utilization data appear. In the face of the increase of the massive electricity consumption data, most power departments only use the traditional statistical method to analyze the abnormal data, and mostly need to rely on field inspection to realize the abnormal data analysis. Due to the limitation of manpower, material resources and financial resources, the hidden deep-level reasons behind the abnormal data cannot be effectively refined, but data disaster and data waste are brought. Therefore, the requirements are difficult to meet by using the traditional analysis means, and the deeper rule of the power utilization data abnormality needs to be found by data mining, so that the contingency of the data is eliminated, and the data necessity is refined.
Due to the fact that the number of low-voltage customer groups is large and changes frequently, line loss caused by management reasons such as unclear user change relation, poor meter reading quality, electricity stealing and metering faults generally exist in the existing distribution room line loss management. In recent years, many power supply enterprises in China face a common dilemma in different degrees, namely that the investment is large and the return is small in the aspect of managing the line loss of a transformer area, the root of the dilemma is that the main factors influencing the line loss of the transformer area are changed into the loss in management since the last decade, and the investment direction of transformation is unchanged.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides an abnormal electricity utilization data detection method. The invention integrates the application based on the fusion of the real-time database, the cloud computing and the cloud real-time storage platform technology, and realizes the high throughput rate of the large data batch processing task by using the efficient parallel computing technology. And an isolated forest algorithm with good stability and strong anti-noise performance is adopted to effectively mine data abnormal users, analyze line loss reasons and strengthen the line loss management of the transformer area.
The invention provides an abnormal electricity consumption data detection method, which comprises the following steps:
acquiring data, and acquiring power utilization data in a power utilization information acquisition mode;
data cleaning, namely cleaning the collected power utilization data, and detecting the type of dirty data in the power utilization data to obtain effective power utilization data; the types of dirty data include: missing values, repeated values, maximum and minimum values, load burrs and impact negative values;
performing characteristic dimension reduction on the effective power utilization data by using daily load characteristic indexes, wherein the daily load characteristic indexes comprise a load rate, a peak-valley difference rate, a highest utilization hour rate, a peak load rate, a flat load rate and a valley load rate;
establishing a model, namely establishing an isolated forest by using a plurality of isolated trees, establishing a first analysis model by using an isolated forest algorithm, and performing model evaluation by using an evaluation curve;
and screening abnormal users, screening target data by using the first analysis model, mining the screened data, and screening abnormal users in power utilization.
Preferably, the electricity information collection mode includes cloud storage, the cloud storage is used for storing the electricity data on a plurality of independent storage servers in a distributed manner, and the types of the storage servers include metadata management service, volume management service and block data management service.
Preferably, the step of data cleaning further comprises: the calculation formula of filling the null loss value according to the fluctuation cycle characteristic of the electrical load is as follows:
wherein, XiAnd the power load at the current moment is represented, i is the moment when the load data is missing and takes a value of 1-24, and a1 and a2 are weighting coefficients of the load at two days before and after the moment and two time points before and after the current moment.
Preferably, before the step of acquiring data, the method further comprises the steps of:
establishing a management scheme, and establishing a line loss management index of the transformer area, wherein the state identifier of the line loss management index of the transformer area comprises a coverage class, a household variation class, a collectable class, a data class and a line loss class; and carrying out state identification on the collected power consumption data of the plurality of areas, and taking corresponding control measures according to different states to form a line loss management scheme of the transformer area.
Preferably, the step of establishing the model further comprises the step of performing model evaluation by using a receiver operating characteristic ROC curve, an area under the curve AUC, an accumulated recall ratio curve and a P-R curve with the precision ratio as a vertical coordinate and the recall ratio as a horizontal coordinate.
Preferably, the isolated forest algorithm comprises a first-stage algorithm and a second-stage algorithm, wherein the first-stage algorithm comprises the steps of constructing a plurality of isolated trees to form an isolated forest; the second stage algorithm includes evaluating test data using the generated isolated forest and calculating anomaly scores for the detected data.
An electronic device, comprising: a processor;
a memory; and a program, wherein the program is stored in the memory and configured to be executed by the processor, the program comprising an abnormal electricity usage data detection method.
A computer-readable storage medium having stored thereon a computer program for execution by a processor, the computer program comprising a method of abnormal electricity usage data detection.
An abnormal electricity consumption data detection system comprises a data acquisition module, a data cleaning module, a data dimension reduction module, a model establishing module and an abnormal user screening module; wherein,
the data acquisition module is used for acquiring power utilization data in a power utilization information acquisition mode;
the data cleaning module is used for cleaning the collected power utilization data and detecting the type of dirty data in the power utilization data to obtain effective power utilization data; the types of dirty data include: missing values, repeated values, maximum and minimum values, load burrs and impact negative values;
the data dimension reduction module is used for performing characteristic dimension reduction on the effective electricity utilization data by using daily load characteristic indexes, wherein the daily load characteristic indexes comprise a load rate, a peak-valley difference rate, a highest utilization hour rate, a peak load rate, a flat-period load rate and a valley-period load rate;
the model establishing module is used for establishing an isolated forest by a plurality of isolated trees, establishing a first analysis model by using an isolated forest algorithm and performing model evaluation by using an evaluation curve;
the screening abnormal user module is used for screening target data by using the first analysis model, mining the screened data and screening abnormal users of power consumption.
Preferably, the system further comprises a management scheme establishing module, wherein the management scheme establishing module is used for establishing a station area line loss management index, and the state identifier of the station area line loss management index comprises a coverage class, a user variation class, a collectable class, a data class and a line loss class; carrying out state identification on the collected power consumption data of the plurality of transformer areas, and taking corresponding control measures aiming at different states to form a transformer area line loss management scheme;
the data acquisition module comprises a cloud storage unit, the cloud storage unit is used for storing the electricity utilization data on a plurality of independent storage servers in a scattered manner, and the types of the storage servers comprise metadata management service, volume management service and block data management service;
the data cleaning module comprises a missing value filling unit, and the missing value filling unit is used for filling a calculation formula of an empty missing value according to the fluctuation cycle characteristic of the power load as follows:
wherein, XiAnd the power load at the current moment is represented, i is the moment when the load data is missing and takes a value of 1-24, and a1 and a2 are weighting coefficients of the load at two days before and after the moment and two time points before and after the current moment.
Compared with the prior art, the invention has the beneficial effects that:
1) the abnormal power consumption data detection method is a new station line loss management method which is adaptive to the development requirements of the smart power grid, effectively solves the problems existing in the current station area management, enables the station area line loss management to be more transparent and efficient, plays a comprehensive management role in the marketing management, and finally achieves the aims of energy conservation, loss reduction and standard management.
2) The established transformer area line loss management index system has five states: the system comprises five state identifiers of a coverage class, a household transformer class, a collectable class, a data class and a line loss class and the hierarchical relation thereof. For the transformer areas in different states, different control methods, control periods and responsibility departments are formulated according to different control key points, and finally the transformer areas are pushed to realize good state progression.
3) The invention effectively solves the problems in line loss management, can carry out data mining research and analysis aiming at the line loss abnormity of the power utilization system of the transformer area, ensures that the line loss management is more transparent and efficient, can exert the comprehensive management application thereof, and finally realizes the aims of energy conservation, loss reduction and standardized management;
4) the cloud computing technology can provide high-quality services distributed according to needs by utilizing distributed software and hardware resources and information, and is successfully applied to a plurality of fields such as search engines, social networks and communication. In the field of intelligent power grid information construction, the cloud computing has large-scale data efficient access and parallel computing capacity, so that high-quality data processing service can be provided for information systems including power utilization information acquisition systems, and a solid technical support is provided for an information system in the era of an intelligent power grid.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to be implemented according to the content of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings. The detailed description of the present invention is given in detail by the following examples and the accompanying drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention to a lesser extent. In the drawings:
FIG. 1 is a general flow chart of a method for detecting abnormal electricity consumption data according to the present invention;
fig. 2 is a schematic diagram illustrating a stage area line loss management index state progression of an abnormal power consumption data detection method according to the present invention;
FIG. 3 is a schematic diagram of an isolated tree construction according to an abnormal electricity consumption data detection method of the present invention;
FIG. 4 is a schematic diagram of a data dimension reduction process of an abnormal electricity consumption data detection method according to the present invention;
FIG. 5 is a schematic diagram of an abnormal user screening method according to the present invention;
FIG. 6 is a schematic diagram of an overall structure of a service-oriented architecture of an abnormal electricity consumption data detection system according to the present invention;
fig. 7 is a schematic diagram of an overall structure of an abnormal electricity consumption data detection system according to the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and the detailed description, and it should be noted that any combination of the embodiments or technical features described below can be used to form a new embodiment without conflict.
An abnormal electricity consumption data detection method, as shown in fig. 1, includes the following steps:
s0, establishing a management scheme, and establishing a distribution room line loss management index, wherein the state identification of the distribution room line loss management index comprises a coverage class, a household variation class, a collectable class, a data class and a line loss class; and carrying out state identification on the collected power consumption data of the multiple regions, and taking corresponding control measures according to different states to form a distribution room line loss management scheme. In one embodiment, as shown in FIG. 2, 1.1 — initial stage area: preparing data, and bringing the data into hierarchical progressive management; 1.2-installation type platform area: reasonably arranging the installation of equipment, wherein the coverage rate reaches 100 percent; 1.3-family transformer area: checking the station area household variation relation, wherein the accuracy rate reaches 100%; 1.4-collectable district: collecting for multiple times, analyzing faults, and enabling the recovery rate to reach 95%; 1.5-data type region: collecting for many times, analyzing errors, and enabling the mining rate to reach 95%; 1.6-line loss type region: analyzing the abnormal reason of the line loss rate and making loss reduction measures; 1.7-standard platform area: and (5) adopting excellent fixing measures to keep the standard-reaching state.
Specifically, the established line loss management indexes of the transformer area comprise five state identifiers of a coverage class, a household transformer class, a collectable class, a data class and a line loss class and the hierarchical relationship thereof; the method comprises the following steps of carrying out the following state identification on a plurality of districts according to collected power load data, and making corresponding control measures aiming at the control key points of different types of districts, thereby forming a district line loss management method based on a power consumption information collection system, wherein the specific measures are as follows:
a covering class: the installation rate of the acquisition equipment in the platform area does not reach 100%, and an acquisition equipment installation plan is reasonably arranged;
and (3) transforming the user into a category: acquiring a transformer area with the coverage rate reaching 100%, wherein the house-to-house transformation relation is not accurate, and the house-to-house transformation relation is approved in a mode of combining internal data inspection and external data inspection;
the method comprises the following steps: acquiring the coverage rate of 100 percent, but the mining rate of 95 percent, counting the mining rate, and analyzing the reasons of mining omission and mining error;
data class: the coverage rate reaches 100%, the extraction rate reaches 95%, the household variation relation is correct, the error between the acquired data and the manual meter reading data is larger than the average value, and a reasonable meter reading plan is formulated;
line loss: the coverage rate, the mining rate and the accuracy rate all reach 100 percent, the household variable relation is correct, but the line loss rate is abnormal, the reason of the abnormal line loss rate is analyzed in time, and loss reduction measures are made.
And S1, acquiring data, and acquiring power utilization data in a power utilization information acquisition mode. In one embodiment, for the line loss transformer area, a cloud storage technology is adopted to realize the collection, classification and processing of the power consumption information data of a plurality of line loss transformer areas. The method comprises the steps that a distributed file storage mechanism of cloud storage is adopted, electricity utilization information data are stored on a plurality of independent storage servers in a scattered mode, and the distributed file storage mechanism comprises volume management, metadata management and block data management services;
the metadata refers to the name, attribute and data block position information of the file, and the metadata is loaded and cached in the memory for management due to frequent metadata access, so that the access efficiency is improved.
The block data is a plurality of data blocks formed by dividing file data according to a certain size, and is distributed and stored on different storage node servers, and a storage space provided by a pair of metadata servers and storage server nodes managed by the metadata servers is called a volume space;
the volume management server is responsible for virtualizing and integrating a plurality of volumes and providing unified overall access cloud real-time storage platform space for the outside.
S2, data cleaning, cleaning the collected power utilization data, and detecting the type of dirty data in the power utilization data to obtain effective power utilization data; the types of dirty data include: missing values, repeated values, extreme minimum values, load spikes, impact negative values. In one embodiment, the analysis summarizes the type of the dirty data, and then takes a targeted approach according to the representation form of the dirty data to delete the redundant data in the data set and maintain the integrity of the data set. Common dirty data types are 1), missing values: null values in the table; 2) repetition value: the user repeats the electric load data at a certain moment; 3) maximum and minimum values: the electricity load data is too large or too small; 4) and loading burrs: sudden increase or decrease between data of adjacent time periods; 5) and impact negative value: the reading data decreases continuously for a certain period of time.
According to the fluctuation cycle characteristic of the electric load, calculating the average value of the loads at the same time points on two adjacent days and the loads at two time points before and after the current time for the seriously-missing data according to the fluctuation cycle characteristic of the electric load, and filling the vacancy value by adding the load change quantity to the average value by a load change rate method of the day after the current day, wherein the calculation method comprises the following steps:
in the formula, XiRepresenting the power load at the current moment, i is the moment of load data loss and takes the value of 1-24, alpha1And alpha2Weighting coefficients of loads at two days before and after the table corresponding to the time and two time points before and after the current time, and for abnormal noise point data, using a rectangle method to collect the time of each dayThe load data at the moment is integrated to calculate the repair value of the electric quantity, and the calculation formula is as follows:
in the formula, XiFor the electricity restoration value, F is the number of load data acquisitions in a day, PiLoad data at the moment i and delta T are load data acquisition time intervals.
And S3, performing characteristic dimension reduction on the effective power utilization data by using daily load characteristic indexes, wherein the daily load characteristic indexes comprise a load rate, a peak-valley difference rate, a highest utilization hour rate, a peak load rate, a flat load rate and a valley load rate. In one embodiment, as shown in fig. 4, the load curve as a time series is susceptible to various factors such as air temperature, income, and electricity price policy, and the influence results are intrinsic characteristics of the time series data, cannot be sufficiently reflected by distance, and cannot fully guarantee similarity of the form or contour of the time series. Also, curves with a pronounced loading shape, such as the daily loading curve, may exhibit an undesirable equidistant behavior in high dimensions. In order to fully reflect the similarity between loads and consider the operation efficiency, six common daily load characteristic indexes are selected in the embodiment: the load rate, the peak-valley difference rate, the highest utilization hour rate, the peak load rate, the flat load rate and the valley load rate reflect the power utilization characteristics of various users comprehensively from four angles of the whole day, the peak period, the flat period and the valley period. And performing characteristic dimension reduction on the effective load curve matrix by using six daily load characteristic indexes.
S4, establishing a model, establishing the model, establishing a plurality of isolated trees into an isolated forest, establishing a first analysis model by using an isolated forest algorithm, and performing model evaluation by using an evaluation curve;
in one embodiment, as shown in FIG. 3, an orphan tree iTree is constructed as follows: 1. randomly selecting a feature from six daily load characteristic indexes; 2. randomly selecting a value k of the feature; 3. classifying each record according to the characteristics, placing the records smaller than k in the characteristics on a left branch, and placing the records larger than or equal to k on a right branch; 4. the left and right branches are then recursively constructed until the following conditions are satisfied: a. the incoming data set has only one record or a plurality of identical records; b. the height of the tree reaches a defined height.
Specifically, an isolated forest consisting of t iTrees is constructed, and the steps are as follows:
1. selecting psi point sample points from training data randomly as a sub sample set, and putting the psi point sample points into a root node of a tree;
2. randomly appointing a dimension, and randomly generating a cutting point P in the current node data;
3. and generating a hyperplane by using the cutting point, dividing the data space of the current node into 2 subspaces, placing data smaller than P in the specified dimension on the left side of the current node, and placing data larger than or equal to P on the right side of the current node.
4. Recursion steps 1 and 2 in the child nodes, new child nodes are continuously constructed until the data is not subdivided or the depth of the tree reaches log2ψ。
S5, screening abnormal users, screening target data by using the first analysis model in the step model building, mining the screened data, and screening abnormal users in power utilization. In an embodiment, as shown in fig. 5, several diverse itrees form iForest, and model evaluation is performed by using the ROC curve and AUC and the cumulative recall curve and P-R curve, so that the ifoest can only evaluate a single user at a time, and all the itrees need to be traversed during each evaluation process. And counting the positions of leaf nodes where the query object falls, and calculating the abnormal score according to the average path length of the leaf nodes. And finally, evaluating the user according to the abnormal score, and judging whether the user to be detected is an abnormal user.
Specifically, for the receiver operating characteristic ROC curve, the ROC curve can remain unchanged when the distribution of positive and negative samples in the test set changes. For continuous numerical values output by the binary classification model, samples larger than the threshold value are classified as positive classes, and samples smaller than the threshold value are classified as negative classes. More positive classes can be identified by reducing the threshold value, the recall ratio is improved, and more negative samples can be classified as positive classes, so that the false alarm rate is improved. The ROC curve visualizes this variation, and in the ROC space coordinate, the point (0,1) represents the ideal classifier, and the closer the ROC curve is to the point (0,1) represents the better classification effect. The numerical value of AUC is the size of the area of the lower part of the ROC curve, AUC is 1 corresponding to an ideal classifier, AUC is 0.5 representing the same as random guess, the model has no predictive value, and the AUC is 0.5 to 1 representing the advantage over the random guess.
For the P-R curve, the precision is taken as a vertical axis, the recall ratio is taken as a horizontal axis to be plotted, so that a curve of the precision and the recall ratio is obtained, the precision is reduced and the recall ratio is increased along with the change of a classification threshold value from large to small for the P-R curve, and when the classifier is evaluated, the closer the P-R curve to a point (1,1) is, the better the classification effect is.
Specifically, the generated iForest is used to evaluate the test data, and an anomaly score is calculated for the detected sample. For any data x, making the data x traverse each iTree, and obtaining the depth of x at the iTree and the average depth h (x) of x at each iTree, thereby calculating the abnormal score of the sample. The anomaly score for the detected sample x is defined as follows:
wherein: h (x) is the depth of the node retrieved in the iTree by the detected sample x; e (h (x)) is the average of all t iTrees; c (psi) is the average path length of the bipartite search tree constructed from psi points;
h (k) ═ ln (k) + ζ, ζ is an euler constant.
Observing the definition formula of the abnormal score, the following can be known: when E (h (x)) → 0, s → 1; when E (h (x)) → ψ -1, s → 0; when E (h (x)) → c (ψ), s → 0.5. That is, s (x) is closer to 1, indicating a high possibility of abnormal data, and is closer to 0, indicating a high possibility of being a normal point.
An electronic device, comprising: a processor;
a memory; and a program, wherein the program is stored in the memory and configured to be executed by the processor, the program comprising an abnormal electricity usage data detection method.
A computer-readable storage medium having stored thereon a computer program for execution by a processor, the computer program comprising a method of abnormal electricity usage data detection.
An abnormal electricity consumption data detection system is shown in fig. 7 and comprises a data acquisition module, a data cleaning module, a data dimension reduction module, a model establishing module and an abnormal user screening module; wherein,
the data acquisition module is used for acquiring power utilization data in a power utilization information acquisition mode;
the data cleaning module is used for cleaning the collected power utilization data and detecting the type of dirty data in the power utilization data to obtain effective power utilization data; the types of dirty data include: missing values, repeated values, maximum and minimum values, load burrs and impact negative values;
the data dimension reduction module is used for performing characteristic dimension reduction on the effective electricity utilization data by using daily load characteristic indexes, wherein the daily load characteristic indexes comprise a load rate, a peak-valley difference rate, a highest utilization hour rate, a peak load rate, a flat-period load rate and a valley-period load rate;
the model establishing module is used for establishing an isolated forest by a plurality of isolated trees, establishing a first analysis model by using an isolated forest algorithm and performing model evaluation by using an evaluation curve;
the screening abnormal user module is used for screening target data by using the first analysis model, mining the screened data and screening abnormal users of power consumption.
The system further comprises a management scheme establishing module, wherein the management scheme establishing module is used for establishing a distribution room line loss management index, and the state identifier of the distribution room line loss management index comprises a coverage class, a household variation class, a collectable class, a data class and a line loss class; carrying out state identification on the collected power consumption data of the plurality of transformer areas, and taking corresponding control measures aiming at different states to form a transformer area line loss management scheme;
the data acquisition module comprises a cloud storage unit, the cloud storage unit is used for storing the electricity utilization data on a plurality of independent storage servers in a scattered manner, and the types of the storage servers comprise metadata management service, volume management service and block data management service;
the data cleaning module comprises a missing value filling unit, and the missing value filling unit is used for filling a calculation formula of an empty missing value according to the fluctuation cycle characteristic of the power load as follows:
wherein, XiAnd the power load at the current moment is represented, i is the moment when the load data is missing and takes a value of 1-24, and a1 and a2 are weighting coefficients of the load at two days before and after the moment and two time points before and after the current moment.
In a specific embodiment, a service-oriented architecture system is used as a general design concept architecture in the development design of the system, a data acquisition module adopts a special variable terminal with a 05-version protocol, and voltage, current and electric quantity data of a user electric energy meter, namely, 96 points in total in 24 hours, can be acquired every 15 minutes, namely, a data set S is an n x 24-order initial load curve matrix formed by n daily load curves. The module realizes distributed storage of the acquired mass data through a cloud storage technology. The data are processed to obtain: the power company in a certain county from 9 months to 3 months in 2018 has 701 districts, the total capacity of the district is 34.9 kilo VA, the average single capacity is 497.8 kilo VA, the accumulated loss electric quantity is 4.6 kilo Kwh, and the average district line loss rate is 2.69%.
Further, as shown in fig. 6, the collection cluster periodically collects information from the user terminal, and stores the data in the cloud storage and query environment by calling the storage interface; the data storage and query environment is responsible for storing the acquired information at high concurrency, and provides electricity utilization data indexing and efficient query functions upwards. The parallel ETL (Extraction-Transformation-Loading) environment is responsible for exchanging data between file information in the original related coefficient database and the cloud computing environment; and establishing a data table mapping relation and a task execution strategy by using an ETL management tool, and carrying out real-time tracking, acquisition and consistency check on data in the associated system by the system through a parallel ETL tool. And the parallel analysis and computing environment is responsible for running an isolated forest algorithm to mine abnormal data. The front-end interface comprises an SQL (structured Query language) interface, a Web service, a client package and the like, and provides a service for inquiring, analyzing and calculating for an external system. The mapping tool adopts an SQL-to-Map/Reduce optimization technology based on query rewrite, converts the original SQL into a query graph, and utilizes rewrite rules to evolve into various forms, so that the auxiliary migration, correctness verification and performance optimization of the application program in the original storage process form to the cloud computing environment are realized, the migration cost of the application of the relational database to the cloud computing can be greatly reduced, the development efficiency is improved, and the overall performance of parallel computing is improved.
The cloud storage unit adopts a parallel ETL environment, original calculation intensive complex tasks are subjected to atomic decomposition, are distributed to different task processing nodes, and are subjected to concurrent synchronous processing, so that the data processing efficiency and the data processing capacity are improved, and the data processing performance is ensured.
The system also comprises a loss reduction auxiliary decision module which mainly comprises a loss reduction decision support function and a loss reduction scheme library management part, wherein the module is used for checking users with abnormal electricity consumption data, and the following contents are focused: a. whether electricity stealing behavior exists in the transformer area or not; b. the load operation of the platform area is changed, and whether the platform area is changed or not is judged; c. whether the transformer in the transformer area is lightly loaded or heavily loaded; d. the running condition of reactive compensation equipment; e. whether the three-phase load is balanced; f. voltage quality; g. whether the transformer, the line and the metering equipment are reasonable and normal; the reasons causing the abnormity of the electric energy metering device mainly include meter faults, mutual inductor faults, junction box faults, terminal faults and the like; h. whether the low-voltage power supply radius is too long or not; i. other causes cause line loss anomalies.
Table 1 shows the line loss distribution room analysis statistical table of the 2019 year 3 month abnormal electricity consumption data detection system:
as can be seen from table 1, the current line loss rate standard-reaching cell areas are 694 cells, which account for about 99% of the total number of the cell areas in the management range, and the adoption device coverage rate is low, which is a main reason for influencing the overall process of cell area management and control. Through further detail analysis of the transformer area, the main reason that the coverage rate of transformer area acquisition is low is that the installation rate of non-resident acquisition equipment in most transformer areas is low. After the reason is found, the installation scheme of the acquisition equipment should be adjusted. In addition, the problem that the station area user variable relations are inaccurate is ranked second in the influence on the station area control effect, 312 station area user variable relations are accurate in 456 station areas with acquisition coverage rate of 100%, the accuracy rate is 68%, and the main reason is that data are lost in part of old station areas through investigation and discovery of 144 station areas with inaccurate station area user variable relations; secondly, the load changes greatly but the data change is not timely in the operation of the transformer area. When the acquisition equipment is reasonably arranged to be installed, attention is paid to the checking of the station area household variable relationship, and the station area customer bidirectional identifier can be used for assisting in carrying out the checking of the field household variable relationship.
The foregoing is merely a preferred embodiment of the invention and is not intended to limit the invention in any manner; those skilled in the art can readily practice the invention as shown and described in the drawings and detailed description herein; however, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the scope of the invention as defined by the appended claims; meanwhile, any equivalent changes, modifications and variations of the above embodiments according to the actual techniques of the present invention are also within the scope of the protection of the technical solution of the present invention.

Claims (10)

1. An abnormal electricity consumption data detection method is characterized by comprising the following steps:
acquiring data, and acquiring power utilization data in a power utilization information acquisition mode;
data cleaning, namely cleaning the collected power utilization data, and detecting the type of dirty data in the power utilization data to obtain effective power utilization data; the types of dirty data include: missing values, repeated values, maximum and minimum values, load burrs and impact negative values;
performing characteristic dimension reduction on the effective power utilization data by using daily load characteristic indexes, wherein the daily load characteristic indexes comprise a load rate, a peak-valley difference rate, a highest utilization hour rate, a peak load rate, a flat load rate and a valley load rate;
establishing a model, namely establishing an isolated forest by using a plurality of isolated trees, establishing a first analysis model by using an isolated forest algorithm, and performing model evaluation by using an evaluation curve;
and screening abnormal users, screening target data by using the first analysis model, mining the screened data, and screening abnormal users in power utilization.
2. The abnormal electricity consumption data detection method according to claim 1, wherein the electricity consumption information collection mode comprises cloud storage, the cloud storage is used for storing the electricity consumption data on a plurality of independent storage servers in a distributed mode, and the types of the storage servers comprise metadata management service, volume management service and block data management service.
3. The abnormal electricity data detection method according to claim 1 or 2, further comprising, in the step of data cleansing: the calculation formula of filling the null loss value according to the fluctuation cycle characteristic of the electrical load is as follows:
wherein, XiAnd the power load at the current moment is represented, i is the moment when the load data is missing and takes a value of 1-24, and a1 and a2 are weighting coefficients of the load at two days before and after the moment and two time points before and after the current moment.
4. The abnormal electricity consumption data detection method according to claim 3, further comprising, before the step of acquiring data, the steps of:
establishing a management scheme, and establishing a line loss management index of the transformer area, wherein the state identifier of the line loss management index of the transformer area comprises a coverage class, a household variation class, a collectable class, a data class and a line loss class; and carrying out state identification on the collected power consumption data of the plurality of transformer areas, and taking corresponding control measures according to different states to form a transformer area line loss management scheme.
5. The abnormal electricity consumption data detection method of claim 1, wherein the step of modeling further comprises using a receiver operating characteristic ROC curve, an area under the curve AUC, an accumulated recall curve and a P-R curve to evaluate the model with precision as ordinate and recall as abscissa.
6. The abnormal electricity consumption data detection method as claimed in claim 1 or 5, wherein the isolated forest algorithm comprises a first stage algorithm and a second stage algorithm, the first stage algorithm comprises constructing a plurality of isolated trees to form an isolated forest; the second stage algorithm includes evaluating test data using the generated isolated forest and calculating anomaly scores for the detected data.
7. An electronic device, comprising: a processor;
a memory; and a program, wherein the program is stored in the memory and configured to be executed by the processor, the program comprising instructions for carrying out the method of claim 1.
8. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program is executed by a processor for performing the method as claimed in claim 1.
9. The abnormal electricity consumption data detection system is characterized by comprising a data acquisition module, a data cleaning module, a data dimension reduction module, a model building module and an abnormal user screening module; wherein,
the data acquisition module is used for acquiring power utilization data in a power utilization information acquisition mode;
the data cleaning module is used for cleaning the collected power utilization data and detecting the type of dirty data in the power utilization data to obtain effective power utilization data; the types of dirty data include: missing values, repeated values, maximum and minimum values, load burrs and impact negative values;
the data dimension reduction module is used for performing feature dimension reduction on the effective power utilization data by using daily load characteristic indexes, wherein the daily load characteristic indexes comprise a load rate, a peak-valley difference rate, a highest utilization hour rate, a peak load rate, a flat load rate and a valley load rate;
the model establishing module is used for establishing an isolated forest by a plurality of isolated trees, establishing a first analysis model by using an isolated forest algorithm and performing model evaluation by using an evaluation curve;
the screening abnormal user module is used for screening target data by using the first analysis model, mining the screened data and screening abnormal users of power consumption.
10. The abnormal power consumption data detection system according to claim 1, further comprising a management scheme establishment module, wherein the management scheme establishment module is configured to establish a distribution room line loss management index, and the status identifier of the distribution room line loss management index includes a coverage class, a household variation class, a collectable class, a data class, and a line loss class; carrying out state identification on the collected power consumption data of the plurality of transformer areas, and taking corresponding control measures according to different states to form a transformer area line loss management scheme;
the data acquisition module comprises a cloud storage unit, the cloud storage unit is used for storing the electricity utilization data on a plurality of independent storage servers in a scattered manner, and the types of the storage servers comprise metadata management service, volume management service and block data management service;
the data cleaning module comprises a missing value filling unit, and the missing value filling unit is used for filling a calculation formula of an empty missing value according to the fluctuation cycle characteristic of the power load as follows:
wherein, XiAnd the power load at the current moment is represented, i is the moment when the load data is missing and takes a value of 1-24, and a1 and a2 are weighting coefficients of the load at two days before and after the moment and two time points before and after the current moment.
CN201910641996.7A 2019-07-16 2019-07-16 A kind of exception electricity consumption data detection method, system, equipment, storage medium Pending CN110503570A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910641996.7A CN110503570A (en) 2019-07-16 2019-07-16 A kind of exception electricity consumption data detection method, system, equipment, storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910641996.7A CN110503570A (en) 2019-07-16 2019-07-16 A kind of exception electricity consumption data detection method, system, equipment, storage medium

Publications (1)

Publication Number Publication Date
CN110503570A true CN110503570A (en) 2019-11-26

Family

ID=68586132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910641996.7A Pending CN110503570A (en) 2019-07-16 2019-07-16 A kind of exception electricity consumption data detection method, system, equipment, storage medium

Country Status (1)

Country Link
CN (1) CN110503570A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177138A (en) * 2019-12-30 2020-05-19 深圳市恒泰能源科技有限公司 Big data analysis method, device, equipment and storage medium for power demand side
CN111522864A (en) * 2020-04-21 2020-08-11 国网四川省电力公司电力科学研究院 Enterprise production mode recognition and transfer production early warning method based on electricity consumption data
CN111611255A (en) * 2020-04-30 2020-09-01 广东良实机电工程有限公司 Equipment energy consumption energy-saving management method and device, terminal equipment and storage medium
CN111666276A (en) * 2020-06-11 2020-09-15 上海积成能源科技有限公司 Method for eliminating abnormal data by applying isolated forest algorithm in power load prediction
CN111669368A (en) * 2020-05-07 2020-09-15 宜通世纪科技股份有限公司 End-to-end network sensing abnormity detection and analysis method, system, device and medium
CN111694822A (en) * 2020-04-30 2020-09-22 云南电网有限责任公司信息中心 Low-voltage distribution network operation state data acquisition system and acquisition method thereof
CN112362292A (en) * 2020-10-30 2021-02-12 北京交通大学 Method for anomaly detection of wind tunnel test data
CN113033897A (en) * 2021-03-26 2021-06-25 国网上海市电力公司 Method for identifying station area subscriber variation relation based on electric quantity correlation of subscriber branch
CN113657872A (en) * 2021-09-02 2021-11-16 南方电网数字电网研究院有限公司 Method and device for analyzing abnormal archive information of power consumer and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657288A (en) * 2017-10-26 2018-02-02 国网冀北电力有限公司 A kind of power scheduling flow data method for detecting abnormality based on isolated forest algorithm
CN108011782A (en) * 2017-12-06 2018-05-08 北京百度网讯科技有限公司 Method and apparatus for pushing warning information
CN108985632A (en) * 2018-07-16 2018-12-11 国网上海市电力公司 A kind of electricity consumption data abnormality detection model based on isolated forest algorithm
CN110189232A (en) * 2019-05-14 2019-08-30 三峡大学 Power information based on isolated forest algorithm acquires data exception analysis method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657288A (en) * 2017-10-26 2018-02-02 国网冀北电力有限公司 A kind of power scheduling flow data method for detecting abnormality based on isolated forest algorithm
CN108011782A (en) * 2017-12-06 2018-05-08 北京百度网讯科技有限公司 Method and apparatus for pushing warning information
CN108985632A (en) * 2018-07-16 2018-12-11 国网上海市电力公司 A kind of electricity consumption data abnormality detection model based on isolated forest algorithm
CN110189232A (en) * 2019-05-14 2019-08-30 三峡大学 Power information based on isolated forest algorithm acquires data exception analysis method

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177138A (en) * 2019-12-30 2020-05-19 深圳市恒泰能源科技有限公司 Big data analysis method, device, equipment and storage medium for power demand side
CN111522864A (en) * 2020-04-21 2020-08-11 国网四川省电力公司电力科学研究院 Enterprise production mode recognition and transfer production early warning method based on electricity consumption data
CN111522864B (en) * 2020-04-21 2020-11-10 国网四川省电力公司电力科学研究院 Enterprise production mode recognition and transfer production early warning method based on electricity consumption data
CN111611255A (en) * 2020-04-30 2020-09-01 广东良实机电工程有限公司 Equipment energy consumption energy-saving management method and device, terminal equipment and storage medium
CN111694822A (en) * 2020-04-30 2020-09-22 云南电网有限责任公司信息中心 Low-voltage distribution network operation state data acquisition system and acquisition method thereof
CN111611255B (en) * 2020-04-30 2023-12-12 广东良实机电工程有限公司 Equipment energy consumption energy-saving management method and device, terminal equipment and storage medium
CN111669368A (en) * 2020-05-07 2020-09-15 宜通世纪科技股份有限公司 End-to-end network sensing abnormity detection and analysis method, system, device and medium
CN111669368B (en) * 2020-05-07 2022-12-06 宜通世纪科技股份有限公司 End-to-end network sensing abnormity detection and analysis method, system, device and medium
CN111666276A (en) * 2020-06-11 2020-09-15 上海积成能源科技有限公司 Method for eliminating abnormal data by applying isolated forest algorithm in power load prediction
CN112362292A (en) * 2020-10-30 2021-02-12 北京交通大学 Method for anomaly detection of wind tunnel test data
CN113033897A (en) * 2021-03-26 2021-06-25 国网上海市电力公司 Method for identifying station area subscriber variation relation based on electric quantity correlation of subscriber branch
CN113657872A (en) * 2021-09-02 2021-11-16 南方电网数字电网研究院有限公司 Method and device for analyzing abnormal archive information of power consumer and computer equipment

Similar Documents

Publication Publication Date Title
CN110503570A (en) A kind of exception electricity consumption data detection method, system, equipment, storage medium
CN110097297B (en) Multi-dimensional electricity stealing situation intelligent sensing method, system, equipment and medium
CN106845717B (en) Energy efficiency evaluation method based on multi-model fusion strategy
CN110189232A (en) Power information based on isolated forest algorithm acquires data exception analysis method
CN102708149A (en) Data quality management method and system
CN115905319B (en) Automatic identification method and system for abnormal electricity fees of massive users
CN111552686B (en) Power data quality assessment method and device
CN107862459B (en) Metering equipment state evaluation method and system based on big data
CN115563477B (en) Harmonic data identification method, device, computer equipment and storage medium
CN115718861A (en) Method and system for classifying power users and monitoring abnormal behaviors in high-energy-consumption industry
CN115617784A (en) Data processing system and processing method for informationized power distribution
Zhao et al. Hadoop-based power grid data quality verification and monitoring method
CN111127186A (en) Application method of customer credit rating evaluation system based on big data technology
CN112488360B (en) Distribution variation routine analysis early warning method based on artificial intelligence
Sun Management Research of Big Data Technology in Financial Decision-Making of Enterprise Cloud Accounting
CN110175705B (en) Load prediction method and memory and system comprising same
CN113705920A (en) Generation method of water data sample set for thermal power plant and terminal equipment
CN112256693A (en) Method for predicting line fault power failure and customer complaints
Hu et al. Big data management and application research in power load forecasting and power transmission and transformation equipment evaluation
Li et al. Multi-source heterogeneous log fusion technology of power information system based on big data and imprecise reasoning theory
Wang Visualization Analysis of Meteorological Big Data through Deep Learning and Network Model
Lu et al. A study on the business data evaluation method of the power grid value-added service
Yu et al. Research on risk identification of power theft in low-voltage distribution network based on machine learning hybrid algorithm
Tian et al. Nonlinear Data Classification of Power Internet of Things Considering Transient and Steady State
Wang et al. Technical Analysis of Probability Early Warning of User Stealing Electricity Based on Big Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191126