CN106909933B - A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features - Google Patents

A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features Download PDF

Info

Publication number
CN106909933B
CN106909933B CN201710036718.XA CN201710036718A CN106909933B CN 106909933 B CN106909933 B CN 106909933B CN 201710036718 A CN201710036718 A CN 201710036718A CN 106909933 B CN106909933 B CN 106909933B
Authority
CN
China
Prior art keywords
stealing
feature
client
cluster
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710036718.XA
Other languages
Chinese (zh)
Other versions
CN106909933A (en
Inventor
欧阳志友
岳东
薛禹胜
窦春霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
State Grid Electric Power Research Institute
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201710036718.XA priority Critical patent/CN106909933B/en
Publication of CN106909933A publication Critical patent/CN106909933A/en
Application granted granted Critical
Publication of CN106909933B publication Critical patent/CN106909933B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R22/00Arrangements for measuring time integral of electric power or current, e.g. electricity meters
    • G01R22/06Arrangements for measuring time integral of electric power or current, e.g. electricity meters by electronic methods
    • G01R22/061Details of electronic electricity meters
    • G01R22/066Arrangements for avoiding or indicating fraudulent use
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Power Engineering (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of electricity consumption behavior classification Forecasting Methodologies of three stages various visual angles Fusion Features, first to customer electricity data to be analyzed, as test set, and the missing data in daily power consumption, same day ammeter reading, proxima luce (prox. luc) ammeter reading is filled respectively with " 1 " and " 0 ", form two parts of preprocessed datas;Secondly, to every part of preprocessed data, feature is extracted with different view, the feature of all visual angles extraction is merged, the machine learning algorithm predicted using multiple and different classification is handled, and draws the stealing probability of the client in training set and test set;Finally, the output of second stage is predicted with linear model and tree-model respectively, then averaged, obtain the stealing probability finally to be predicted.The present invention adds the diversity of data, the diversity of model and over-fitting processing, client's stealing probability is more accurately predicted so as to realize on the basis of the integrated learning approach of existing heap model.

Description

A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features
Technical field
Machine learning method more particularly to a kind of three stages various visual angles the present invention relates to customer electricity behavior classification prediction The stealing classification Forecasting Methodology of Fusion Features.
Background technology
The development of social economy causes society's electricity consumption amount to increase year by year, is driven by interests, client's exception electricity consumption, that is, stealing Phenomenon also getting worse.Client's electricity stealing not only causes heavy economic losses to power supply enterprise, has also seriously affected normal Supply electricity consumption order.It is counted according to State Grid Corporation of China, loss reaches up to ten million members caused by client's stealing in recent years.In recent years, Client's stealing mode also by barbarous stealing develop to device intelligence, means specialization, behavior hiddenization, implement scale height Scientific and technological stealing further increases very big difficulty to work of electricity anti-stealing.As electric system upgrades, intelligent power equipment it is general And grid company can be with the customer electricity behavioral data of real-time collecting magnanimity, power equipment monitoring data, to pass through big data point Analysis technology provides the foundation to carry out the electricity stealing of client prediction.It is realized by big data analysis technology to client's stealing probability Prediction, can be analyzed with the thief-proof pyroelectric monitor of the development of science, improve work of electricity anti-stealing efficiency, reduce the time of electricity stealing analysis And cost.
When analyzing the electricity consumption behavior of a large amount of clients, since client's amount is huge, history electricity consumption data lacks more Seriously, existing machine learning method is faced with missing values processing, feature extraction, feature selecting and Model Fusion etc. in processing The challenge of many aspects not only requires height to computing resource, but also the spy to hundreds of dimensions, thousands of dimensions that requires a great deal of time Sign is combined and selects.Meanwhile single sorting algorithm is also difficult to obtain the prediction result of preferable client's stealing probability, because This, research can better conform to shortage of data, and the method for reducing feature selection process and improving precision of prediction has very strong society It can demand and very big economic value.
The content of the invention
The technical problems to be solved by the invention are to be directed to the defects of involved in background technology, provide a kind of three stages The stealing classification Forecasting Methodology of various visual angles Fusion Features.
The present invention uses following technical scheme to solve above-mentioned technical problem:
A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features, includes the following steps:
Step 1), to customer electricity data to be analyzed, read as test set, and to daily power consumption, same day ammeter Missing data in number, proxima luce (prox. luc) ammeter reading is filled respectively with " -1 " and " 0 ", forms two parts of preprocessed datas;
Step 2), to every part of preprocessed data:
Step 2.1), selected from time window statistics, abnormal sudden change Data-Statistics and these three visual angles of time series analysis Feature is extracted at least two visual angles, the set of the characteristic value of each visual angle extraction as an individual feature cluster, then The individual feature cluster extracted merges into a feature cluster, and the feature cluster after each individual feature cluster and merging The set of formation is closed as the feature gathering of the preprocessed data;
Step 2.2), to each feature cluster in the conjunction of feature gathering, the sorting algorithm using at least one two classification makes It is pre- that stealing probability is carried out with each client in this feature cluster respectively training set to default customer electricity data, test set It surveys;
Step 3), for each client in training set and test set, it is measured in advance in two parts of preprocessed datas To each prediction stealing probability form its predict stealing Making by Probability Sets;
Step 4), using the prediction stealing Making by Probability Sets of all clients in training set and test set as feature, respectively with tree Disaggregated model and linear classification model predict test set, obtain each client in customer electricity data to be analyzed Final prediction stealing probability;
Step 5), in the customer electricity data being analysed to the final prediction stealing probability of each client respectively with it is default Stealing probability threshold value is compared, and the client that final prediction stealing probability is more than default stealing probability threshold value is divided into stealing The client that final prediction stealing probability is less than or equal to default stealing probability threshold value is divided into normal clients by client.
Stealing as a kind of three stages various visual angles Fusion Features of the present invention is classified the further prioritization scheme of Forecasting Methodology, The step 2.1)It is middle select three visual angles come detailed step when extracting feature for:
Step 2.1.1), the power consumption carried out to each client by every month counts, and as time window feature cluster, The power consumption statistics includes maximum, minimum value, average and the root variance of power consumption,;
Step 2.1.2), the numerical value catastrophe of statistics daily power consumption, same day ammeter reading and proxima luce (prox. luc) ammeter reading, and As Characteristics of Mutation cluster, the numerical value catastrophe includes ammeter reading, the daily power consumption less than proxima luce (prox. luc) ammeter reading Missing, same day ammeter reading missing, the ammeter reading that proxima luce (prox. luc) ammeter reading lacks and daily power consumption is negative;
Step 2.1.3), to each client in chronological order, daily power consumption is converted into time series, respectively extraction time The peak value number of sequence, trough number, average, quantile, seasonal trend, periodical trend time series feature, as when Sequence characteristics cluster;
Step 2.1.4), time window feature cluster, Characteristics of Mutation cluster and temporal aspect cluster are merged into a feature cluster;
Step 2.1.5), the feature cluster after time window feature cluster, Characteristics of Mutation cluster, temporal aspect cluster and merging is formed Gather and closed as the feature gathering of preprocessed data.
Stealing as a kind of three stages various visual angles Fusion Features of the present invention is classified the further prioritization scheme of Forecasting Methodology, The step 2.2)Detailed step be:
Each feature cluster in being closed to feature gathering uses this feature cluster using the sorting algorithm of at least one two classification The training set to default customer electricity data, each client in test set carry out stealing probabilistic forecasting respectively;
Step 2.2.1), the data of training set are divided into N parts of training datas by client's random sampling;
Step 2.2.2), for every part of training data:
As sub- verification collection, remaining N-1 parts of training data intersection as sub- training set, successively using feature gathering Each feature cluster in conjunction predicts client in the training data and test set using the sorting algorithm of at least one two classification Stealing probability;
Step 2.2.3), by step 2.2.2)In the prediction results of all training datas merge, obtain in training set The predicted value of the stealing probability of each client;
Step 2.2.4), to step 2.2.2)In the stealing per a client in the corresponding test set of each part training data Probability is averaged, and obtains the predicted value of the stealing probability of each client in test set.
Stealing as a kind of three stages various visual angles Fusion Features of the present invention is classified the further prioritization scheme of Forecasting Methodology, Step 2.2)Middle use two classification sorting algorithm include XGBoost, LightGBM, Keras, Nerual Network, Logistic Regression and Gradient Boost Decision Tree.
Stealing as a kind of three stages various visual angles Fusion Features of the present invention is classified the further prioritization scheme of Forecasting Methodology, Step 4)Described in tree classification model for XGBoost, LightGBM, Keras, Nerual Network, Gradient One kind in Boosting Decision Tree.
Stealing as a kind of three stages various visual angles Fusion Features of the present invention is classified the further prioritization scheme of Forecasting Methodology, Step 4)Described in linear classification model for booster be set as gblinear XGBoost, One kind in LogisticRegression, Linear Regression.
The present invention compared with prior art, has following technique effect using above technical scheme:
1. the method for the present invention can only consider the feature selection issues inside the feature set at single visual angle, keep away Exempt from existing method in the feature of thousands of dimensions come a large amount of computing resources and time resource required when doing feature selecting;
2. compared with existing machine learning method or integrated learning approach, the method for the present invention is a large amount of to existing in reality The data set of missing data is more effective, while by increasing the diversity of data set, the diversity of model and anti-over-fitting, can be with While calculation amount is reduced, precision of prediction is promoted;
The algorithm of prediction 3. the method for the present invention is classified during realization it is not necessary to modify existing customer's electricity consumption behavior, can Existing classification prediction algorithm is made full use of to realize.
Description of the drawings
Fig. 1 is the principle schematic of three stage various visual angles Fusion Features in the present invention.
Specific embodiment
Technical scheme is described in further detail below in conjunction with the accompanying drawings:
The invention discloses a kind of stealing classification Forecasting Methodologies of three stages various visual angles Fusion Features, include the following steps:
Step 1), to customer electricity data to be analyzed, read as test set, and to daily power consumption, same day ammeter Missing data in number, proxima luce (prox. luc) ammeter reading is filled respectively with " -1 " and " 0 ", forms two parts of preprocessed datas;
Step 2), to every part of preprocessed data:
Step 2.1), selected from time window statistics, abnormal sudden change Data-Statistics and these three visual angles of time series analysis Feature is extracted at least two visual angles, the set of the characteristic value of each visual angle extraction as an individual feature cluster, then The individual feature cluster extracted merges into a feature cluster, and the feature cluster after each individual feature cluster and merging The set of formation is closed as the feature gathering of the preprocessed data;
Step 2.2), to each feature cluster in the conjunction of feature gathering, the sorting algorithm using at least one two classification makes It is pre- that stealing probability is carried out with each client in this feature cluster respectively training set to default customer electricity data, test set It surveys;
Step 3), for each client in training set and test set, it is measured in advance in two parts of preprocessed datas To each prediction stealing probability form its predict stealing Making by Probability Sets;
Step 4), using the prediction stealing Making by Probability Sets of all clients in training set and test set as feature, respectively with tree Disaggregated model and linear classification model predict test set, obtains in customer electricity data to be analyzed each client most Prediction stealing probability eventually;
Step 5), in the customer electricity data being analysed to the final prediction stealing probability of each client respectively with it is default Stealing probability threshold value is compared, and the client that final prediction stealing probability is more than default stealing probability threshold value is divided into stealing The client that final prediction stealing probability is less than or equal to default stealing probability threshold value is divided into normal clients by client.
The step 2.1)It is middle select three visual angles come detailed step when extracting feature for:
Step 2.1.1), each client is counted by the power consumption of every month, and as time window feature cluster, The power consumption statistics includes maximum, minimum value, average and the root variance of power consumption,;
Step 2.1.2), the numerical value catastrophe of statistics daily power consumption, same day ammeter reading and proxima luce (prox. luc) ammeter reading, and As Characteristics of Mutation cluster, the numerical value catastrophe includes ammeter reading, the daily power consumption less than proxima luce (prox. luc) ammeter reading Missing, same day ammeter reading missing, the ammeter reading that proxima luce (prox. luc) ammeter reading lacks and daily power consumption is negative;
Step 2.1.3), to each client in chronological order, daily power consumption is converted into time series, respectively extraction time The peak value number of sequence, trough number, average, quantile, seasonal trend, periodical trend time series feature, as when Sequence characteristics cluster;
Step 2.1.4), time window feature cluster, Characteristics of Mutation cluster and temporal aspect cluster are merged into a feature cluster;
Step 2.1.5), the feature cluster after time window feature cluster, Characteristics of Mutation cluster, temporal aspect cluster and merging is formed Gather and closed as the feature gathering of preprocessed data.
The step 2.2)Detailed step be:
Each feature cluster in being closed to feature gathering uses this feature cluster using the sorting algorithm of at least one two classification The training set to default customer electricity data, each client in test set carry out stealing probabilistic forecasting respectively;
Step 2.2.1), the data of training set are divided into N parts of training datas by client's random sampling;
Step 2.2.2), for every part of training data:
As sub- verification collection, remaining N-1 parts of training data intersection as sub- training set, successively using feature gathering Each feature cluster in conjunction predicts client in the training data and test set using the sorting algorithm of at least one two classification Stealing probability;
Step 2.2.3), by step 2.2.2)In the prediction results of all training datas merge, obtain in training set The predicted value of the stealing probability of each client;
Step 2.2.4), to step 2.2.2)In the stealing per a client in the corresponding test set of each part training data Probability is averaged, and obtains the predicted value of the stealing probability of each client in test set.
Step 2.2)The sorting algorithm of two classification of middle use includes XGBoost, LightGBM, Keras, Nerual Network, Logistic Regression and Gradient Boost Decision Tree.
Step 4)Described in tree classification model for XGBoost, LightGBM, Keras, Nerual Network, One kind in Gradient Boosting Decision Tree.
Step 4)Described in linear classification model for booster be set as gblinear XGBoost, One kind in LogisticRegression, Linear Regression.
As shown in Figure 1, it is in one embodiment of the present of invention, preprocessed data collection used is 2, extracts regarding for feature Time window statistical nature and abnormal sudden change feature this 2 have only been selected in angle for simplicity, and the sorting algorithm of selection is 2, is done Data are divided into 5 parts during Fusion Features(N=5).
The present embodiment comprises the following steps:
Step 1), to data to be predicted, by the daily power consumption of missing(KWH), same day ammeter reading(KWH_READING) With proxima luce (prox. luc) ammeter reading(KWH_READING1)- 1 and 0 are filled with respectively, generate two preprocessed files PD1 and PD2.
Step 2), PD1 and PD2 is distinguished from time window statistical nature and this 2 different visual angles of abnormal sudden change feature Feature is extracted, obtains the intersection V2A of intersection V1A, V21 and V22 of V11, V12, V21, V22, V11 and V12:
Step 2.1), after custom partitioning, be monthly divided into different time windows the time, count each time window The feature of interior day electricity consumption, including maximum, minimum value, intermediate value, average, 0 number, continuous 0 number, decile etc., As time window feature.To PD1 and PD2 difference extraction time window features, V11 and V21 are obtained;
Step 2.2), after custom partitioning, after temporally sorting from small to large, daily power consumption is counted respectively as negative, day Power consumption is 0, same day ammeter reading is less than proxima luce (prox. luc) ammeter reading etc., as abnormal sudden change feature.PD1 and PD2 are carried respectively Abnormal sudden change feature is taken, obtains V12 and V22;
Step 2.3), the feature set at multiple visual angles of PD1 is merged, i.e., V11 and V12 is merged, obtains feature intersection V1A;The feature set at multiple visual angles of PD2 is merged, i.e., V21 and V22 is merged, obtains feature intersection V2A;
Step 3), training set and test set are predicted using two kinds of different classification prediction algorithms to each feature set respectively In client stealing probability:
Step 3.1), to each feature set, training set is divided into 5 parts(N=5).
Step 3.2), arbitrary 4 parts of training datas are taken, with classification prediction algorithm training pattern, then prediction is in addition a instructs Practice the stealing probability of client in data and test data;
Step 3.3), by step 3.2)In obtain the stealing probabilistic forecasting data of training data are merged, obtain to whole The stealing probability of client in a training set;By step 3.2)In obtain flat is asked to the stealing probabilistic forecasting value of client in test set , the prediction probability to client's stealing in test set is obtained;
Step 3.4), step is used respectively to each feature set V11, V12, V1A, V21, V22, V2A with classification prediction algorithm M 3.1), step 3.2), step 3.3), the step of, obtain stealing prediction probability M11, M12 to each feature set, M1A, M21、M22、M2A;With classification prediction algorithm N(N is different classification prediction algorithms with M)To each feature set V11, V12, V1A, V21, V22, V2A use step 31 respectively), step 32), step 33)The step of, obtain the stealing prediction probability to each feature set N11、N12、N1A、N21、N22、N2A;
Step 4), by step 3)To the stealing probabilistic forecasting value of client in training set as training set input feature vector, to surveying The stealing probabilistic forecasting value collected is tried as test set input feature vector, the tree-model predicted respectively with classification and the linear of prediction of classifying Model is averaging prediction result to predict the stealing probability of client in test set, obtains final client's stealing prediction knot Fruit:
Step 4.1), by step 3)The prediction probability of obtained basic model as feature, by M11, N11, M12, N12, M1A, N1A, M21, N21, M22, N22, M2A, N2A are attached by major key of customer number, with linear classification algorithm LogisticRegressionClassifier carries out classification prediction, obtains the predicted value to client's stealing probability in test set LA;
Step 4.2), by step 3)The prediction probability of obtained basic model as feature, by M11, N11, M12, N12, M1A, N1A, M21, N21, M22, N22, M2A, N2A are attached by major key of customer number, with tree classification algorithm XGBoost into Row classification prediction, obtains the predicted value TA to client's stealing probability in test set;
Step 4.3), by step 4.1)With step 4.2)Client's stealing probabilistic forecasting value average, as most client's Final prediction stealing probability R;
Step 5), in the customer electricity data being analysed to the final prediction stealing probability of each client respectively with it is default Stealing probability threshold value is compared, and the client that final prediction stealing probability is more than default stealing probability threshold value is divided into stealing The client that final prediction stealing probability is less than or equal to default stealing probability threshold value is divided into normal clients by client.
The present invention basic principle be:Different fillings is carried out to the missing values of customer electricity data to be analyzed first, Multiple and different preprocessed data collection is generated, adds the diversity of data so that subsequent feature extraction and model can be more The good information implied using missing data.Secondly in characteristic extraction procedure, to each preprocessed data collection, unite from time window Construction feature collection is distinguished at meter, mutation multiple visual angles such as Data-Statistics and time series feature, and by the spy of the extraction at multiple visual angles Sign merges into a feature set, this allows the characteristic for preferably portraying the feature of each preprocessed data collection data set, Simultaneously because several feature sets set out to build with different view, the otherness between feature set is very big, avoids feature Between interfere with each other, reduce the calculating process of feature selecting.Simultaneously as each preprocessed data collection, all build The feature intersection of one feature cluster by multiple and different visual angles, therefore can preferably merge the feature at multiple and different visual angles Collection, is conducive to final Model Fusion.In model construction process, using multiple existing mainstream sorting algorithms, including XGBoost, Gradient Boost Decision Tree, Neural Network scheduling algorithms add the diversity of algorithm, So that the combination of algorithms of different can preferably portray the characteristic of data from different angles.Finally, using tree-model and linearly The average of the prediction probability of model can preferably avoid the over-fitting problem of model as final prediction result.The above method Prediction of more accurately classifying to client's stealing probability is realized with smaller resource, there is better practical engineering application value.
Those skilled in the art of the present technique are it is understood that unless otherwise defined, all terms used herein(Including skill Art term and scientific terminology)With the identical meaning of the general understanding with the those of ordinary skill in fields of the present invention.Also It should be understood that those terms such as defined in the general dictionary should be understood that with in the context of the prior art The consistent meaning of meaning, and unless defined as here, will not be explained with the meaning of idealization or overly formal.
Above-described specific embodiment has carried out the purpose of the present invention, technical solution and advantageous effect further It is described in detail, it should be understood that the foregoing is merely the specific embodiments of the present invention, is not limited to this hair Bright, within the spirit and principles of the invention, any modification, equivalent substitution, improvement and etc. done should be included in the present invention Protection domain within.

Claims (6)

1. the stealing classification Forecasting Methodology of a kind of three stages various visual angles Fusion Features, which is characterized in that include the following steps:
Step 1), to customer electricity data to be analyzed, as test set, and to daily power consumption, same day ammeter reading, preceding Missing data in ammeter reading on the one is filled respectively with " -1 " and " 0 ", forms two parts of preprocessed datas;
Step 2), to every part of preprocessed data:
Step 2.1), selected at least from time window statistics, abnormal sudden change Data-Statistics and these three visual angles of time series analysis Feature is extracted at two visual angles, and the set of the characteristic value of each visual angle extraction is as an individual feature cluster, then extraction To individual feature cluster merge into a feature cluster, and the feature cluster each individual feature cluster and after merging is formed Set as the preprocessed data feature gathering conjunction;
Step 2.2), each feature cluster in being closed to feature gathering, being used using the sorting algorithm of at least one two classification should The training set to default customer electricity data, each client in test set carry out stealing probabilistic forecasting to feature cluster respectively;
Step 3), for each client in training set and test set, it is predicted what is obtained in two parts of preprocessed datas Each prediction stealing probability forms it and predicts stealing Making by Probability Sets;
Step 4), using the prediction stealing Making by Probability Sets of all clients in training set and test set as feature, tree classification is used respectively Model and linear classification model predict test set, and obtain two prediction probability values are averaged, and obtain to be analyzed The final prediction stealing probability of each client in customer electricity data;
Step 5), in the customer electricity data being analysed to the final prediction stealing probability of each client respectively with default stealing Probability threshold value is compared, and the client that final prediction stealing probability is more than default stealing probability threshold value is divided into stealing visitor The client that final prediction stealing probability is less than or equal to default stealing probability threshold value is divided into normal clients by family.
2. a kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features as described in claim 1, which is characterized in that The step 2.1)It is middle select three visual angles come detailed step when extracting feature for:
Step 2.1.1), each client is counted by the power consumption of every month, and as time window feature cluster, it is described Power consumption counts maximum, minimum value, average and the root variance for including power consumption;
Step 2.1.2), statistics daily power consumption, the numerical value catastrophe of same day ammeter reading and proxima luce (prox. luc) ammeter reading, and by its As Characteristics of Mutation cluster, the numerical value catastrophe includes the ammeter reading less than proxima luce (prox. luc) ammeter reading, daily power consumption lacks, The ammeter reading that same day ammeter reading missing, proxima luce (prox. luc) ammeter reading lack and daily power consumption is negative;
Step 2.1.3), to each client in chronological order, daily power consumption is converted into time series, respectively extraction time sequence Peak value number, trough number, average, quantile, seasonal trend, periodical trend time series feature, it is special as sequential Levy cluster;
Step 2.1.4), time window feature cluster, Characteristics of Mutation cluster and temporal aspect cluster are merged into a feature cluster;
Step 2.1.5), the set of the feature cluster formation after time window feature cluster, Characteristics of Mutation cluster, temporal aspect cluster and merging Feature gathering as preprocessed data is closed.
3. a kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features as claimed in claim 2, which is characterized in that The step 2.2)Detailed step be:
Each feature cluster in being closed to feature gathering, the sorting algorithm using at least one two classification are distinguished using this feature cluster Each client in training set, test set to default customer electricity data carries out stealing probabilistic forecasting;
Step 2.2.1), the data of training set are divided into N parts of training datas by client's random sampling;
Step 2.2.2), for every part of training data:
As sub- verification collection, remaining N-1 parts of training data intersection as sub- training set, in being closed successively using feature gathering Each feature cluster, the stealing of client in the sub- verification collection and test set predicted using the sorting algorithm of at least one two classification Probability;
Step 2.2.3), by step 2.2.2)In the prediction results of all training datas merge, obtain each in training set The predicted value of the stealing probability of client;
Step 2.2.4), to step 2.2.2)In the stealing probability per a client in the corresponding test set of each part training data It averages, obtains the predicted value of the stealing probability of each client in test set.
4. a kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features as claimed in claim 3, which is characterized in that Step 2.2)Middle use two classification sorting algorithm include XGBoost, LightGBM, Keras, Nerual Network, Logistic Regression and Gradient Boost Decision Tree.
5. a kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features as claimed in claim 3, which is characterized in that Step 4)Described in tree classification model for XGBoost, LightGBM, Keras, Nerual Network, Gradient One kind in Boosting Decision Tree.
6. a kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features as claimed in claim 3, which is characterized in that Step 4)Described in linear classification model for booster be set as gblinear XGBoost, One kind in LogisticRegression, Linear Regression.
CN201710036718.XA 2017-01-18 2017-01-18 A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features Expired - Fee Related CN106909933B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710036718.XA CN106909933B (en) 2017-01-18 2017-01-18 A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710036718.XA CN106909933B (en) 2017-01-18 2017-01-18 A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features

Publications (2)

Publication Number Publication Date
CN106909933A CN106909933A (en) 2017-06-30
CN106909933B true CN106909933B (en) 2018-05-18

Family

ID=59206516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710036718.XA Expired - Fee Related CN106909933B (en) 2017-01-18 2017-01-18 A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features

Country Status (1)

Country Link
CN (1) CN106909933B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107492043A (en) * 2017-09-04 2017-12-19 国网冀北电力有限公司电力科学研究院 stealing analysis method and device
CN107862347A (en) * 2017-12-04 2018-03-30 国网山东省电力公司济南供电公司 A kind of discovery method of the electricity stealing based on random forest
CN108490288B (en) * 2018-03-09 2019-04-16 华南师范大学 A kind of stealing detection method and system
CN108961215A (en) * 2018-06-05 2018-12-07 上海大学 Parkinson's disease assistant diagnosis system and method based on Multimodal medical image
CN109359674A (en) * 2018-09-27 2019-02-19 智庭(北京)智能科技有限公司 A kind of smart lock method for detecting abnormality based on multi-model blending
CN109858679A (en) * 2018-12-30 2019-06-07 国网浙江省电力有限公司 A kind of opposing electricity-stealing for the man-machine object of combination checks monitoring system and its working method
CN110119755A (en) * 2019-03-22 2019-08-13 国网浙江省电力有限公司信息通信分公司 Electricity method for detecting abnormality based on Ensemble learning model
CN111507507B (en) * 2020-03-24 2023-04-18 重庆森鑫炬科技有限公司 Big data-based monthly water consumption prediction method
CN112101420A (en) * 2020-08-17 2020-12-18 广东工业大学 Abnormal electricity user identification method for Stacking integration algorithm under dissimilar model
CN112232985B (en) * 2020-10-15 2023-02-28 国网天津市电力公司 Power distribution and utilization data monitoring method and device for ubiquitous power Internet of things
CN112485491A (en) * 2020-11-23 2021-03-12 国网北京市电力公司 Power stealing identification method and device
CN112561569B (en) * 2020-12-07 2024-02-27 上海明略人工智能(集团)有限公司 Dual-model-based store arrival prediction method, system, electronic equipment and storage medium
CN113128567A (en) * 2021-03-25 2021-07-16 云南电网有限责任公司 Abnormal electricity consumption behavior identification method based on electricity consumption data
CN113435513B (en) * 2021-06-28 2024-06-04 平安科技(深圳)有限公司 Deep learning-based insurance customer grouping method, device, equipment and medium
CN116954591B (en) * 2023-06-15 2024-02-23 天云融创数据科技(北京)有限公司 Generalized linear model training method, device, equipment and medium in banking field
CN117033916B (en) * 2023-07-10 2024-07-23 国网四川省电力公司营销服务中心 Power theft detection method based on neural network
CN116933986B (en) * 2023-09-19 2024-01-23 国网湖北省电力有限公司信息通信公司 Electric power data safety management system based on deep learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102866321A (en) * 2012-08-13 2013-01-09 广东电网公司电力科学研究院 Self-adaptive stealing-leakage prevention diagnosis method
CN103778567A (en) * 2014-01-21 2014-05-07 深圳供电局有限公司 Method and system for discriminating abnormal electricity utilization of user
CN105069476A (en) * 2015-08-10 2015-11-18 国网宁夏电力公司 Method for identifying abnormal wind power data based on two-stage integration learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070136082A1 (en) * 2005-12-14 2007-06-14 Southern Company Services, Inc. System and method for energy diversion investigation management

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102866321A (en) * 2012-08-13 2013-01-09 广东电网公司电力科学研究院 Self-adaptive stealing-leakage prevention diagnosis method
CN103778567A (en) * 2014-01-21 2014-05-07 深圳供电局有限公司 Method and system for discriminating abnormal electricity utilization of user
CN105069476A (en) * 2015-08-10 2015-11-18 国网宁夏电力公司 Method for identifying abnormal wind power data based on two-stage integration learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Anomaly detection of power Consumption based on waveform feature recognition;Tang Yijia et al;《The 11th International Conference on Computer Science&Education 》;20161231;587-591 *
应用大数据技术的反窃电分析;陈文瑛 等;《电子测量与仪器学报》;20161031;第30卷(第10期);1558-1566 *

Also Published As

Publication number Publication date
CN106909933A (en) 2017-06-30

Similar Documents

Publication Publication Date Title
CN106909933B (en) A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features
CN111738462B (en) Fault first-aid repair active service early warning method for electric power metering device
CN111612650B (en) DTW distance-based power consumer grouping method and system
CN110082699A (en) A kind of low-voltage platform area intelligent electric energy meter kinematic error calculation method and its system
CN110232203A (en) Knowledge distillation optimization RNN has a power failure prediction technique, storage medium and equipment in short term
CN110309884A (en) Electricity consumption data anomalous identification system based on ubiquitous electric power Internet of Things net system
CN111582548A (en) Power load prediction method based on multivariate user behavior portrait
CN112149873A (en) Low-voltage transformer area line loss reasonable interval prediction method based on deep learning
CN111738331A (en) User classification method and device, computer-readable storage medium and electronic device
CN104346698A (en) Catering member big data analysis and checking system based on cloud computing and data mining
CN110147389A (en) Account number treating method and apparatus, storage medium and electronic device
CN114611738A (en) Load prediction method based on user electricity consumption behavior analysis
CN110009427B (en) Intelligent electric power sale amount prediction method based on deep circulation neural network
CN112508254B (en) Method for determining investment prediction data of transformer substation engineering project
CN116611589B (en) Power failure window period prediction method, system, equipment and medium for main network power transmission and transformation equipment
CN114021425A (en) Power system operation data modeling and feature selection method and device, electronic equipment and storage medium
CN107274025B (en) System and method for realizing intelligent identification and management of power consumption mode
Wang et al. Cloud computing and extreme learning machine for a distributed energy consumption forecasting in equipment-manufacturing enterprises
Sari et al. The effectiveness of hybrid backpropagation Neural Network model and TSK Fuzzy Inference System for inflation forecasting
CN114676931B (en) Electric quantity prediction system based on data center technology
Tee et al. Short-term load forecasting using artificial neural networks
Ignatiadis et al. Forecasting residential monthly electricity consumption using smart meter data
CN113642632B (en) Power system customer classification method and device based on self-adaptive competition and equalization optimization
CN114638171A (en) Power grid project investment prediction method and device, storage medium and equipment
CN114581263A (en) Power grid load analysis method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210416

Address after: 210046, 66 new model street, Gulou District, Jiangsu, Nanjing

Patentee after: NANJING University OF POSTS AND TELECOMMUNICATIONS

Patentee after: STATE GRID ELECTRIC POWER RESEARCH INSTITUTE Co.,Ltd.

Address before: 210046, 66 new model street, Gulou District, Jiangsu, Nanjing

Patentee before: NANJING University OF POSTS AND TELECOMMUNICATIONS

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180518

CF01 Termination of patent right due to non-payment of annual fee