CN106909933A - A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features - Google Patents

A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features Download PDF

Info

Publication number
CN106909933A
CN106909933A CN201710036718.XA CN201710036718A CN106909933A CN 106909933 A CN106909933 A CN 106909933A CN 201710036718 A CN201710036718 A CN 201710036718A CN 106909933 A CN106909933 A CN 106909933A
Authority
CN
China
Prior art keywords
stealing
feature
data
cluster
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710036718.XA
Other languages
Chinese (zh)
Other versions
CN106909933B (en
Inventor
欧阳志友
岳东
薛禹胜
窦春霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
State Grid Electric Power Research Institute
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201710036718.XA priority Critical patent/CN106909933B/en
Publication of CN106909933A publication Critical patent/CN106909933A/en
Application granted granted Critical
Publication of CN106909933B publication Critical patent/CN106909933B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R22/00Arrangements for measuring time integral of electric power or current, e.g. electricity meters
    • G01R22/06Arrangements for measuring time integral of electric power or current, e.g. electricity meters by electronic methods
    • G01R22/061Details of electronic electricity meters
    • G01R22/066Arrangements for avoiding or indicating fraudulent use
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Power Engineering (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of electricity consumption behavior classification Forecasting Methodology of three stages various visual angles Fusion Features, first to customer electricity data to be analyzed, as test set, and the missing data in daily power consumption, same day ammeter reading, proxima luce (prox. luc) ammeter reading is filled with " 1 " and " 0 " respectively, form two parts of preprocessed datas;Secondly, to every part of preprocessed data, feature is extracted with different view, the feature that all visual angles are extracted is merged, processed using the machine learning algorithm of multiple different classification predictions, draw the stealing probability of the client in training set and test set;Finally, the output respectively with linear model and tree-model to second stage is predicted, and then averages, and obtains the stealing probability finally to be predicted.The present invention increased the treatment of the diversity of data, the diversity of model and over-fitting, such that it is able to realize more accurately predicting client's stealing probability on the basis of the integrated learning approach of existing heap model.

Description

A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features
Technical field
The present invention relates to the machine learning method of customer electricity behavior classification prediction, more particularly to a kind of three stage is from various visual angles The stealing classification Forecasting Methodology of Fusion Features.
Background technology
The development of social economy causes that society's electricity consumption amount increases year by year, is ordered about by interests, and client's exception electricity consumption is stealing Phenomenon is also increasingly serious.Client's electricity filching behavior not only causes heavy economic losses to power supply enterprise, has also had a strong impact on normal Confession electricity consumption order.Counted according to State Grid Corporation of China, in recent years because being lost up to up to ten million units caused by client's stealing.In recent years, Client's stealing mode is also developed into the height of device intelligence, means specialization, behavior hiddenization, implement scale by barbarous stealing Scientific and technological stealing, very big difficulty is further increased to work of electricity anti-stealing.With power system upgrade, intelligent power equipment it is general And, grid company can be by big data point with the customer electricity behavioral data of real-time collecting magnanimity, power equipment Monitoring Data Electricity filching behavior prediction of the analysis technology to carry out client provides the foundation.Realized to client's stealing probability by big data analytical technology Prediction, can be analyzed with the thief-proof pyroelectric monitor of the development of science, improve work of electricity anti-stealing efficiency, reduce the time of electricity filching behavior analysis And cost.
When the electricity consumption behavior to a large amount of clients is analyzed, because client's amount is huge, history electricity consumption data is lacked more Seriously, existing machine learning method is faced with missing values treatment, feature extraction, feature selecting and Model Fusion etc. in treatment The challenge of many aspects, it is not only high to computing resource requirement, and the spy to hundreds of dimensions, thousands of dimensions that requires a great deal of time Levy and be combined and select.Meanwhile, single sorting algorithm is also difficult to obtain predicting the outcome for preferable client's stealing probability, because This, research can better conform to shortage of data, and the method for reducing feature selection process and improving precision of prediction has very strong society Can demand and very big economic worth.
The content of the invention
The technical problems to be solved by the invention are directed to involved defect in background technology, there is provided a kind of three stage The stealing classification Forecasting Methodology of various visual angles Fusion Features.
The present invention uses following technical scheme to solve above-mentioned technical problem:
A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features, comprises the following steps:
Step 1), to customer electricity data to be analyzed, as test set, and to daily power consumption, same day ammeter reading, preceding Missing data in ammeter reading on the one is filled with " -1 " and " 0 " respectively, forms two parts of preprocessed datas;
Step 2), to every part of preprocessed data:
Step 2.1), selected at least from time window statistics, abnormal sudden change Data-Statistics and these three visual angles of time series analysis Feature is extracted at two visual angles, the set of the characteristic value that each visual angle is extracted as a single feature cluster, then extraction To single feature cluster merge into a feature cluster, and feature cluster each single feature cluster and after merging is formed Set as the preprocessed data feature gathering close;
Step 2.2), each the feature cluster in being closed to feature gathering, being used using the sorting algorithm of at least one two classification should Feature cluster carries out stealing probabilistic forecasting to each client in training set, the test set of default customer electricity data respectively;
Step 3), for each client in training set and test set, it is predicted what is obtained in two parts of preprocessed datas Each prediction stealing probability constitutes its prediction stealing Making by Probability Sets;
Step 4), using the prediction stealing Making by Probability Sets of all clients in training set and test set as feature, use tree classification respectively Model and linear classification model are predicted to test set, obtain the final of each client in customer electricity data to be analyzed Prediction stealing probability;
Step 5), in the customer electricity data being analysed to the final prediction stealing probability of each client respectively with default stealing Probability threshold value is compared, and final prediction stealing probability is divided into stealing visitor more than the client of default stealing probability threshold value Family, normal clients are divided into by final prediction stealing probability less than or equal to the client of default stealing probability threshold value.
As a kind of stealing classification further prioritization scheme of Forecasting Methodology of three stages various visual angles Fusion Features of the present invention, The step 2.1)Three visual angles of middle selection are extracting detailed step during feature:
Step 2.1.1), each user is counted by the power consumption that carries out every month, and as time window feature cluster, The power consumption statistics includes maximum, minimum value, average, mean square deviation and the root variance of power consumption,;
Step 2.1.2), statistics daily power consumption, the numerical value catastrophe of same day ammeter reading and proxima luce (prox. luc) ammeter reading, and by its As Characteristics of Mutation cluster, the numerical value catastrophe include ammeter reading less than proxima luce (prox. luc) ammeter reading, daily power consumption missing, Same day ammeter reading missing, proxima luce (prox. luc) ammeter reading missing and daily power consumption are the ammeter reading of negative;
Step 2.1.3), to each user in chronological order, daily power consumption is converted into time series, respectively extraction time sequence Peak value number, trough number, average, quantile, seasonal trend, periodicity trend time series feature, it is special as sequential Levy cluster;
Step 2.1.4), time window feature cluster, Characteristics of Mutation cluster and temporal aspect cluster are merged into a feature cluster;
Step 2.1.5), the set that the feature cluster after time window feature cluster, Characteristics of Mutation cluster, temporal aspect cluster and merging is formed Closed as the feature gathering of preprocessed data.
As a kind of stealing classification further prioritization scheme of Forecasting Methodology of three stages various visual angles Fusion Features of the present invention, The step 2.2)Detailed step be:
Each feature cluster in being closed to feature gathering, the sorting algorithm using at least one two classification is distinguished using this feature cluster Stealing probabilistic forecasting is carried out to each client in training set, the test set of default customer electricity data;
Step 2.2.1), the data of training set are divided into N parts of training data by client's random sampling;
Step 2.2.2), for every part of training data:
As sub- checking collection, remaining N-1 parts of training data intersection as sub- training set, in being closed using feature gathering successively Each feature cluster, predict the stealing of client in the training data and test set using the sorting techniques of at least one two classification Probability;
Step 2.2.3), by step 2.2.2)In the default result of all training datas closed, obtain each visitor in training set The predicted value of the stealing probability at family;
Step 2.2.4), to step 2.2.2)In the stealing probability per portion client in the corresponding test set of each part training data Average, obtain the predicted value of the stealing probability of each client in test set.
As a kind of stealing classification further prioritization scheme of Forecasting Methodology of three stages various visual angles Fusion Features of the present invention, Step 2.2.1)Middle use two classification sorting technique comprising XGBoost, LightGBM, Keras, Nerual Network, Logistic Regression and Gradient Boost Decision Tree.
As a kind of stealing classification further prioritization scheme of Forecasting Methodology of three stages various visual angles Fusion Features of the present invention, Step 4)Described in tree classification model be XGBoost, LightGBM, Keras, Nerual Network, Gradient One kind in Boosting Decision Tree.
As a kind of stealing classification further prioritization scheme of Forecasting Methodology of three stages various visual angles Fusion Features of the present invention, Step 4)Described in linear classification model for booster be set as gblinear XGBoost, One kind in LogisticRegeression, Linear Regression.
The present invention uses above technical scheme compared with prior art, with following technique effect:
1. the method for the present invention can cause only consider the feature selection issues inside the feature set at single visual angle, it is to avoid In existing method in the feature of thousands of dimensions to do feature selecting when required a large amount of computing resources and time resource;
2., relative to existing machine learning method or integrated learning approach, to there are a large amount of missings in the inventive method in reality The data set of data is more effective, while by increasing the diversity of data set, the diversity of model and anti-over-fitting, can subtract While few amount of calculation, precision of prediction is lifted;
3. the method for the present invention need not change the algorithm of existing customer's electricity consumption behavior classification prediction in implementation process, can be abundant Realized using existing classification prediction algorithm.
Brief description of the drawings
Fig. 1 is the principle schematic of three stage various visual angles Fusion Features in the present invention.
Specific embodiment
Technical scheme is described in further detail below in conjunction with the accompanying drawings:
The invention discloses a kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features, comprise the following steps:
Step 1), to customer electricity data to be analyzed, as test set, and to daily power consumption, same day ammeter reading, preceding Missing data in ammeter reading on the one is filled with " -1 " and " 0 " respectively, forms two parts of preprocessed datas;
Step 2), to every part of preprocessed data:
Step 2.1), selected at least from time window statistics, abnormal sudden change Data-Statistics and these three visual angles of time series analysis Feature is extracted at two visual angles, the set of the characteristic value that each visual angle is extracted as a single feature cluster, then extraction To single feature cluster merge into a feature cluster, and feature cluster each single feature cluster and after merging is formed Set as the preprocessed data feature gathering close;
Step 2.2), each the feature cluster in being closed to feature gathering, being used using the sorting algorithm of at least one two classification should Feature cluster carries out stealing probabilistic forecasting to each client in training set, the test set of default customer electricity data respectively;
Step 3), for each client in training set and test set, it is predicted what is obtained in two parts of preprocessed datas Each prediction stealing probability constitutes its prediction stealing Making by Probability Sets;
Step 4), using the prediction stealing Making by Probability Sets of all clients in training set and test set as feature, use tree classification respectively Model and linear classification model are predicted to test set, obtain the final pre- of each client in customer electricity data to be analyzed Survey stealing probability;
Step 5), in the customer electricity data being analysed to the final prediction stealing probability of each client respectively with default stealing Probability threshold value is compared, and final prediction stealing probability is divided into stealing visitor more than the client of default stealing probability threshold value Family, normal clients are divided into by final prediction stealing probability less than or equal to the client of default stealing probability threshold value.
The step 2.1)Three visual angles of middle selection are extracting detailed step during feature:
Step 2.1.1), each user is counted by the power consumption that carries out every month, and as time window feature cluster, The power consumption statistics includes maximum, minimum value, average, mean square deviation and the root variance of power consumption,;
Step 2.1.2), statistics daily power consumption, the numerical value catastrophe of same day ammeter reading and proxima luce (prox. luc) ammeter reading, and by its As Characteristics of Mutation cluster, the numerical value catastrophe include ammeter reading less than proxima luce (prox. luc) ammeter reading, daily power consumption missing, Same day ammeter reading missing, proxima luce (prox. luc) ammeter reading missing and daily power consumption are the ammeter reading of negative;
Step 2.1.3), to each user in chronological order, daily power consumption is converted into time series, respectively extraction time sequence Peak value number, trough number, average, quantile, seasonal trend, periodicity trend time series feature, it is special as sequential Levy cluster;
Step 2.1.4), time window feature cluster, Characteristics of Mutation cluster and temporal aspect cluster are merged into a feature cluster;
Step 2.1.5), the set that the feature cluster after time window feature cluster, Characteristics of Mutation cluster, temporal aspect cluster and merging is formed Closed as the feature gathering of preprocessed data.
The step 2.2)Detailed step be:
Each feature cluster in being closed to feature gathering, the sorting algorithm using at least one two classification is distinguished using this feature cluster Stealing probabilistic forecasting is carried out to each client in training set, the test set of default customer electricity data;
Step 2.2.1), the data of training set are divided into N parts of training data by client's random sampling;
Step 2.2.2), for every part of training data:
As sub- checking collection, remaining N-1 parts of training data intersection as sub- training set, in being closed using feature gathering successively Each feature cluster, predict the stealing of client in the training data and test set using the sorting techniques of at least one two classification Probability;
Step 2.2.3), by step 2.2.2)In the default result of all training datas closed, obtain each visitor in training set The predicted value of the stealing probability at family;
Step 2.2.4), to step 2.2.2)In the stealing probability per portion client in the corresponding test set of each part training data Average, obtain the predicted value of the stealing probability of each client in test set.
Step 2.2.1)The sorting technique of two classification of middle use includes XGBoost, LightGBM, Keras, Nerual Network, Logistic Regression and Gradient Boost Decision Tree.
Step 4)Described in tree classification model for XGBoost, LightGBM, Keras, Nerual Network, One kind in Gradient Boosting Decision Tree.
Step 4)Described in linear classification model for booster be set as gblinear XGBoost, One kind in LogisticRegeression, Linear Regression.
As shown in figure 1, in being one embodiment of the present of invention, preprocessed data collection used is 2, extracts regarding for feature Time window statistical nature and abnormal sudden change feature this 2 have only been selected in angle for simplicity, and the sorting algorithm of selection is 2, is done Data are divided into 5 parts during Fusion Features(N=5).
The present embodiment comprises the following steps:
Step 1), to data to be predicted, the daily power consumption that will be lacked(KWH), same day ammeter reading(KWH_READING)With it is preceding Ammeter reading on the one(KWH_READING1)- 1 and 0 is filled with respectively, produces two preprocessed files PD1 and PD2.
Step 2), PD1 and PD2 are distinguished from this 2 different visual angles of time window statistical nature and abnormal sudden change feature Feature is extracted, intersection V1A, V21 of V11, V12, V21, V22, V11 and V12 and the intersection V2A of V22 is obtained:
Step 2.1), by custom partitioning after, the time is monthly divided into different time windows, count in each time window The feature of day electricity consumption, including maximum, minimum value, intermediate value, average, 0 number, continuous 0 number, decile etc., as Time window feature.To PD1 and PD2 difference extraction time window features, V11 and V21 is obtained;
Step 2.2), to custom partitioning after, temporally from small to large sort after, respectively count daily power consumption be negative, day electricity consumption It is that 0, same day ammeter reading is less than proxima luce (prox. luc) ammeter reading etc. to measure, used as abnormal sudden change feature.Different is extracted respectively to PD1 and PD2 Normal Characteristics of Mutation, obtains V12 and V22;
Step 2.3), the feature set at multiple visual angles of PD1 is merged, will V11 and V12 merge, obtain feature intersection V1A;Will The feature set at multiple visual angles of PD2 merges, will V21 and V22 merge, obtain feature intersection V2A;
Step 3), two kinds of different classification prediction algorithms are used to each feature set respectively, in prediction training set and test set The stealing probability of client:
Step 3.1), to each feature set, training set is divided into 5 parts(N=5).
Step 3.2), any 4 parts of training datas are taken, with classification prediction algorithm training pattern, then predict a instruction in addition Practice the stealing probability of client in data and test data;
Step 3.3), by step 3.2)In obtain the stealing probabilistic forecasting data of training data are merged, obtain to whole instruction Practice the stealing probability for concentrating client;By step 3.2)In the stealing probabilistic forecasting value to client in test set that obtains be averaging, Obtain the prediction probability to client's stealing in test set;
Step 3.4), step is used respectively to each feature set V11, V12, V1A, V21, V22, V2A with classification prediction algorithm M 3.1), step 3.2), step 3.3), the step of, obtain stealing prediction probability M11 to each feature set, M12, M1A, M21、M22、M2A;With classification prediction algorithm N(N is different classification prediction algorithms with M)To each feature set V11, V12, V1A, V21, V22, V2A use step 31 respectively), step 32), step 33)The step of, obtain the stealing prediction probability to each feature set N11、N12、N1A、N21、N22、N2A;
Step 4), by step 3)To the stealing probabilistic forecasting value of client in training set as training set input feature vector, to test set Stealing probabilistic forecasting value as test set input feature vector, respectively with the tree-model and the linear model of classification prediction of classification prediction To predict the stealing probability of client in test set, and it is averaging to predicting the outcome, obtains final client's stealing and predict the outcome:
Step 4.1), by step 3)The prediction probability of the basic model for obtaining as feature, by M11, N11, M12, N12, M1A, N1A, M21, N21, M22, N22, M2A, N2A are attached by major key of customer number, use linear classification algorithm LogisticRegressionClassifier carries out classification prediction, obtains the predicted value to client's stealing probability in test set LA;
Step 4.2), by step 3)The prediction probability of the basic model for obtaining as feature, by M11, N11, M12, N12, M1A, N1A, M21, N21, M22, N22, M2A, N2A are attached by major key of customer number, are divided with tree classification algorithm XGBoost Class prediction, obtains the predicted value TA to client's stealing probability in test set;
Step 4.3), by step 4.1)With step 4.2)Client's stealing probabilistic forecasting value average, as the final of most client Prediction stealing probability R;
Step 5), in the customer electricity data being analysed to the final prediction stealing probability of each client respectively with default stealing Probability threshold value is compared, and final prediction stealing probability is divided into stealing visitor more than the client of default stealing probability threshold value Family, normal clients are divided into by final prediction stealing probability less than or equal to the client of default stealing probability threshold value.
General principle of the invention is:Missing values first to customer electricity data to be analyzed carry out different fillings, Multiple different preprocessed data collection are produced, the diversity of data is increased so that follow-up feature extraction and model can be more The implicit information of good utilization missing data.Secondly in characteristic extraction procedure, to each preprocessed data collection, from time window system Construction feature collection is distinguished at meter, mutation Data-Statistics and time series feature etc. multiple visual angles, and by the spy of the extraction at multiple visual angles Levy and merge into a feature set, this allows preferably to portray the feature of each preprocessed data collection the characteristic of data set, Simultaneously because several feature sets are to go out to send structure with different view, the otherness between feature set is very big, it is to avoid feature Between interfere, reduce the calculating process of feature selecting.Simultaneously as each preprocessed data collection, all building The feature intersection of one feature cluster by multiple different visual angles, therefore can preferably merge the feature of multiple different visual angles Collection, is conducive to final Model Fusion.In model construction process, using multiple existing main flow sorting algorithms, including XGBoost, Gradient Boost Decision Tree, Neural Network scheduling algorithms, increased the diversity of algorithm, So that the combination of algorithms of different preferably can from different angles portray the characteristic of data.Finally, using tree-model and linearly The average of the prediction probability of model can preferably avoid the over-fitting problem of model as finally predicting the outcome.The above method Prediction of more accurately classifying to client's stealing probability is realized with smaller resource, with more preferable practical engineering application value.
Those skilled in the art of the present technique it is understood that unless otherwise defined, all terms used herein(Including skill Art term and scientific terminology)With with art of the present invention in those of ordinary skill general understanding identical meaning.Also It should be understood that those terms defined in such as general dictionary should be understood that with the context of prior art in The consistent meaning of meaning, and unless defined as here, will not be explained with idealization or excessively formal implication.
Above-described specific embodiment, has been carried out further to the purpose of the present invention, technical scheme and beneficial effect Describe in detail, should be understood that and the foregoing is only specific embodiment of the invention, be not limited to this hair Bright, all any modification, equivalent substitution and improvements within the spirit and principles in the present invention, done etc. should be included in the present invention Protection domain within.

Claims (6)

1. a kind of three stages various visual angles Fusion Features stealing classification Forecasting Methodology, it is characterised in that comprise the following steps:
Step 1), to customer electricity data to be analyzed, as test set, and to daily power consumption, same day ammeter reading, preceding Missing data in ammeter reading on the one is filled with " -1 " and " 0 " respectively, forms two parts of preprocessed datas;
Step 2), to every part of preprocessed data:
Step 2.1), selected at least from time window statistics, abnormal sudden change Data-Statistics and these three visual angles of time series analysis Feature is extracted at two visual angles, the set of the characteristic value that each visual angle is extracted as a single feature cluster, then extraction To single feature cluster merge into a feature cluster, and feature cluster each single feature cluster and after merging is formed Set as the preprocessed data feature gathering close;
Step 2.2), each the feature cluster in being closed to feature gathering, being used using the sorting algorithm of at least one two classification should Feature cluster carries out stealing probabilistic forecasting to each client in training set, the test set of default customer electricity data respectively;
Step 3), for each client in training set and test set, it is predicted what is obtained in two parts of preprocessed datas Each prediction stealing probability constitutes its prediction stealing Making by Probability Sets;
Step 4), using the prediction stealing Making by Probability Sets of all clients in training set and test set as feature, use tree classification respectively Model and linear classification model are predicted to test set, and two for obtaining prediction probability value is averaged, and obtains to be analyzed The final prediction stealing probability of each client in customer electricity data;
Step 5), in the customer electricity data being analysed to the final prediction stealing probability of each client respectively with default stealing Probability threshold value is compared, and final prediction stealing probability is divided into stealing visitor more than the client of default stealing probability threshold value Family, normal clients are divided into by final prediction stealing probability less than or equal to the client of default stealing probability threshold value.
2. the stealing classification Forecasting Methodology of a kind of three stages various visual angles Fusion Features as claimed in claim 1, it is characterised in that The step 2.1)Three visual angles of middle selection are extracting detailed step during feature:
Step 2.1.1), each user is counted by the power consumption that carries out every month, and as time window feature cluster, The power consumption statistics includes maximum, minimum value, average, mean square deviation and the root variance of power consumption;
Step 2.1.2), statistics daily power consumption, the numerical value catastrophe of same day ammeter reading and proxima luce (prox. luc) ammeter reading, and by its As Characteristics of Mutation cluster, the numerical value catastrophe include ammeter reading less than proxima luce (prox. luc) ammeter reading, daily power consumption missing, Same day ammeter reading missing, proxima luce (prox. luc) ammeter reading missing and daily power consumption are the ammeter reading of negative;
Step 2.1.3), to each user in chronological order, daily power consumption is converted into time series, respectively extraction time sequence Peak value number, trough number, average, quantile, seasonal trend, periodicity trend time series feature, it is special as sequential Levy cluster;
Step 2.1.4), time window feature cluster, Characteristics of Mutation cluster and temporal aspect cluster are merged into a feature cluster;
Step 2.1.5), the set that the feature cluster after time window feature cluster, Characteristics of Mutation cluster, temporal aspect cluster and merging is formed Closed as the feature gathering of preprocessed data.
3. the stealing classification Forecasting Methodology of a kind of three stages various visual angles Fusion Features as claimed in claim 2, it is characterised in that The step 2.2)Detailed step be:
Each feature cluster in being closed to feature gathering, the sorting algorithm using at least one two classification is distinguished using this feature cluster Stealing probabilistic forecasting is carried out to each client in training set, the test set of default customer electricity data;
Step 2.2.1), the data of training set are divided into N parts of training data by client's random sampling;
Step 2.2.2), for every part of training data:
As sub- checking collection, remaining N-1 parts of training data intersection as sub- training set, in being closed using feature gathering successively Each feature cluster, predict the stealing of client in the sub- checking collection and test set using the sorting techniques of at least one two classification Probability;
Step 2.2.3), by step 2.2.2)In the default result of all training datas closed, obtain each visitor in training set The predicted value of the stealing probability at family;
Step 2.2.4), to step 2.2.2)In the stealing probability per portion client in the corresponding test set of each part training data Average, obtain the predicted value of the stealing probability of each client in test set.
4. the stealing classification Forecasting Methodology of a kind of three stages various visual angles Fusion Features as claimed in claim 3, it is characterised in that Step 2.2.1)Middle use two classification sorting technique comprising XGBoost, LightGBM, Keras, Nerual Network, Logistic Regression and Gradient Boost Decision Tree.
5. the stealing classification Forecasting Methodology of a kind of three stages various visual angles Fusion Features as claimed in claim 3, it is characterised in that Step 4)Described in tree classification model be XGBoost, LightGBM, Keras, Nerual Network, Gradient One kind in Boosting Decision Tree.
6. the stealing classification Forecasting Methodology of a kind of three stages various visual angles Fusion Features as claimed in claim 3, it is characterised in that Step 4)Described in linear classification model for booster be set as gblinear XGBoost, One kind in LogisticRegeression, Linear Regression.
CN201710036718.XA 2017-01-18 2017-01-18 A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features Expired - Fee Related CN106909933B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710036718.XA CN106909933B (en) 2017-01-18 2017-01-18 A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710036718.XA CN106909933B (en) 2017-01-18 2017-01-18 A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features

Publications (2)

Publication Number Publication Date
CN106909933A true CN106909933A (en) 2017-06-30
CN106909933B CN106909933B (en) 2018-05-18

Family

ID=59206516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710036718.XA Expired - Fee Related CN106909933B (en) 2017-01-18 2017-01-18 A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features

Country Status (1)

Country Link
CN (1) CN106909933B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107492043A (en) * 2017-09-04 2017-12-19 国网冀北电力有限公司电力科学研究院 stealing analysis method and device
CN107862347A (en) * 2017-12-04 2018-03-30 国网山东省电力公司济南供电公司 A kind of discovery method of the electricity stealing based on random forest
CN108490288A (en) * 2018-03-09 2018-09-04 华南师范大学 A kind of stealing detection method and system
CN108961215A (en) * 2018-06-05 2018-12-07 上海大学 Parkinson's disease assistant diagnosis system and method based on Multimodal medical image
CN109359674A (en) * 2018-09-27 2019-02-19 智庭(北京)智能科技有限公司 A kind of smart lock method for detecting abnormality based on multi-model blending
CN109858679A (en) * 2018-12-30 2019-06-07 国网浙江省电力有限公司 A kind of opposing electricity-stealing for the man-machine object of combination checks monitoring system and its working method
CN110119755A (en) * 2019-03-22 2019-08-13 国网浙江省电力有限公司信息通信分公司 Electricity method for detecting abnormality based on Ensemble learning model
CN111507507A (en) * 2020-03-24 2020-08-07 重庆森鑫炬科技有限公司 Big data-based monthly water consumption prediction method
CN112101420A (en) * 2020-08-17 2020-12-18 广东工业大学 Abnormal electricity user identification method for Stacking integration algorithm under dissimilar model
CN112232985A (en) * 2020-10-15 2021-01-15 国网天津市电力公司 Power distribution and utilization data monitoring method and device for ubiquitous power Internet of things
CN112485491A (en) * 2020-11-23 2021-03-12 国网北京市电力公司 Power stealing identification method and device
CN112561569A (en) * 2020-12-07 2021-03-26 上海明略人工智能(集团)有限公司 Dual-model-based arrival prediction method and system, electronic device and storage medium
CN113128567A (en) * 2021-03-25 2021-07-16 云南电网有限责任公司 Abnormal electricity consumption behavior identification method based on electricity consumption data
CN113435513A (en) * 2021-06-28 2021-09-24 平安科技(深圳)有限公司 Insurance client grouping method, device, equipment and medium based on deep learning
CN116933986A (en) * 2023-09-19 2023-10-24 国网湖北省电力有限公司信息通信公司 Electric power data safety management system based on deep learning
CN116954591A (en) * 2023-06-15 2023-10-27 天云融创数据科技(北京)有限公司 Generalized linear model training method, device, equipment and medium in banking field
CN117033916A (en) * 2023-07-10 2023-11-10 国网四川省电力公司营销服务中心 Power theft detection method based on neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070136082A1 (en) * 2005-12-14 2007-06-14 Southern Company Services, Inc. System and method for energy diversion investigation management
CN102866321A (en) * 2012-08-13 2013-01-09 广东电网公司电力科学研究院 Self-adaptive stealing-leakage prevention diagnosis method
CN103778567A (en) * 2014-01-21 2014-05-07 深圳供电局有限公司 Method and system for discriminating abnormal electricity utilization of user
CN105069476A (en) * 2015-08-10 2015-11-18 国网宁夏电力公司 Method for identifying abnormal wind power data based on two-stage integration learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070136082A1 (en) * 2005-12-14 2007-06-14 Southern Company Services, Inc. System and method for energy diversion investigation management
CN102866321A (en) * 2012-08-13 2013-01-09 广东电网公司电力科学研究院 Self-adaptive stealing-leakage prevention diagnosis method
CN103778567A (en) * 2014-01-21 2014-05-07 深圳供电局有限公司 Method and system for discriminating abnormal electricity utilization of user
CN105069476A (en) * 2015-08-10 2015-11-18 国网宁夏电力公司 Method for identifying abnormal wind power data based on two-stage integration learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TANG YIJIA ET AL: "Anomaly detection of power Consumption based on waveform feature recognition", 《THE 11TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE&EDUCATION 》 *
陈文瑛 等: "应用大数据技术的反窃电分析", 《电子测量与仪器学报》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107492043A (en) * 2017-09-04 2017-12-19 国网冀北电力有限公司电力科学研究院 stealing analysis method and device
CN107862347A (en) * 2017-12-04 2018-03-30 国网山东省电力公司济南供电公司 A kind of discovery method of the electricity stealing based on random forest
CN108490288A (en) * 2018-03-09 2018-09-04 华南师范大学 A kind of stealing detection method and system
CN108490288B (en) * 2018-03-09 2019-04-16 华南师范大学 A kind of stealing detection method and system
CN108961215A (en) * 2018-06-05 2018-12-07 上海大学 Parkinson's disease assistant diagnosis system and method based on Multimodal medical image
CN109359674A (en) * 2018-09-27 2019-02-19 智庭(北京)智能科技有限公司 A kind of smart lock method for detecting abnormality based on multi-model blending
CN109858679A (en) * 2018-12-30 2019-06-07 国网浙江省电力有限公司 A kind of opposing electricity-stealing for the man-machine object of combination checks monitoring system and its working method
CN110119755A (en) * 2019-03-22 2019-08-13 国网浙江省电力有限公司信息通信分公司 Electricity method for detecting abnormality based on Ensemble learning model
CN111507507A (en) * 2020-03-24 2020-08-07 重庆森鑫炬科技有限公司 Big data-based monthly water consumption prediction method
CN112101420A (en) * 2020-08-17 2020-12-18 广东工业大学 Abnormal electricity user identification method for Stacking integration algorithm under dissimilar model
CN112232985A (en) * 2020-10-15 2021-01-15 国网天津市电力公司 Power distribution and utilization data monitoring method and device for ubiquitous power Internet of things
CN112232985B (en) * 2020-10-15 2023-02-28 国网天津市电力公司 Power distribution and utilization data monitoring method and device for ubiquitous power Internet of things
CN112485491A (en) * 2020-11-23 2021-03-12 国网北京市电力公司 Power stealing identification method and device
CN112561569A (en) * 2020-12-07 2021-03-26 上海明略人工智能(集团)有限公司 Dual-model-based arrival prediction method and system, electronic device and storage medium
CN112561569B (en) * 2020-12-07 2024-02-27 上海明略人工智能(集团)有限公司 Dual-model-based store arrival prediction method, system, electronic equipment and storage medium
CN113128567A (en) * 2021-03-25 2021-07-16 云南电网有限责任公司 Abnormal electricity consumption behavior identification method based on electricity consumption data
CN113435513A (en) * 2021-06-28 2021-09-24 平安科技(深圳)有限公司 Insurance client grouping method, device, equipment and medium based on deep learning
CN113435513B (en) * 2021-06-28 2024-06-04 平安科技(深圳)有限公司 Deep learning-based insurance customer grouping method, device, equipment and medium
CN116954591A (en) * 2023-06-15 2023-10-27 天云融创数据科技(北京)有限公司 Generalized linear model training method, device, equipment and medium in banking field
CN116954591B (en) * 2023-06-15 2024-02-23 天云融创数据科技(北京)有限公司 Generalized linear model training method, device, equipment and medium in banking field
CN117033916A (en) * 2023-07-10 2023-11-10 国网四川省电力公司营销服务中心 Power theft detection method based on neural network
CN116933986A (en) * 2023-09-19 2023-10-24 国网湖北省电力有限公司信息通信公司 Electric power data safety management system based on deep learning
CN116933986B (en) * 2023-09-19 2024-01-23 国网湖北省电力有限公司信息通信公司 Electric power data safety management system based on deep learning

Also Published As

Publication number Publication date
CN106909933B (en) 2018-05-18

Similar Documents

Publication Publication Date Title
CN106909933A (en) A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features
CN110232203B (en) Knowledge distillation optimization RNN short-term power failure prediction method, storage medium and equipment
CN111738462B (en) Fault first-aid repair active service early warning method for electric power metering device
CN110082699A (en) A kind of low-voltage platform area intelligent electric energy meter kinematic error calculation method and its system
CN113177357B (en) Transient stability assessment method for power system
CN104537433A (en) Sold electricity quantity prediction method based on inventory capacities and business expansion characteristics
CN111241755A (en) Power load prediction method
CN106779219A (en) A kind of electricity demand forecasting method and system
CN111368904A (en) Electrical equipment identification method based on electric power fingerprint
CN112396234A (en) User side load probability prediction method based on time domain convolutional neural network
CN111582548A (en) Power load prediction method based on multivariate user behavior portrait
CN113780684A (en) Intelligent building user energy consumption behavior prediction method based on LSTM neural network
CN115688993A (en) Short-term power load prediction method suitable for power distribution station area
CN114118588A (en) Peak-facing summer power failure prediction method based on game feature extraction under clustering undersampling
CN113902062A (en) Transformer area line loss abnormal reason analysis method and device based on big data
CN114611738A (en) Load prediction method based on user electricity consumption behavior analysis
CN113762591B (en) Short-term electric quantity prediction method and system based on GRU and multi-core SVM countermeasure learning
Guan et al. Customer load forecasting method based on the industry electricity consumption behavior portrait
CN107093005A (en) The method that tax handling service hall's automatic classification is realized based on big data mining algorithm
CN112508254B (en) Method for determining investment prediction data of transformer substation engineering project
CN108830405B (en) Real-time power load prediction system and method based on multi-index dynamic matching
CN114021425A (en) Power system operation data modeling and feature selection method and device, electronic equipment and storage medium
Wang et al. Cloud computing and extreme learning machine for a distributed energy consumption forecasting in equipment-manufacturing enterprises
CN114676931B (en) Electric quantity prediction system based on data center technology
CN114298413A (en) Hydroelectric generating set runout trend prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210416

Address after: 210046, 66 new model street, Gulou District, Jiangsu, Nanjing

Patentee after: NANJING University OF POSTS AND TELECOMMUNICATIONS

Patentee after: STATE GRID ELECTRIC POWER RESEARCH INSTITUTE Co.,Ltd.

Address before: 210046, 66 new model street, Gulou District, Jiangsu, Nanjing

Patentee before: NANJING University OF POSTS AND TELECOMMUNICATIONS

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180518