CN106909933A - A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features - Google Patents
A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features Download PDFInfo
- Publication number
- CN106909933A CN106909933A CN201710036718.XA CN201710036718A CN106909933A CN 106909933 A CN106909933 A CN 106909933A CN 201710036718 A CN201710036718 A CN 201710036718A CN 106909933 A CN106909933 A CN 106909933A
- Authority
- CN
- China
- Prior art keywords
- stealing
- feature
- data
- cluster
- client
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000000007 visual effect Effects 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000004927 fusion Effects 0.000 title claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 46
- 238000012360 testing method Methods 0.000 claims abstract description 38
- 230000005611 electricity Effects 0.000 claims abstract description 34
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 20
- 241001269238 Data Species 0.000 claims abstract description 11
- 238000000605 extraction Methods 0.000 claims description 11
- 230000035772 mutation Effects 0.000 claims description 11
- 238000013145 classification model Methods 0.000 claims description 9
- 238000003066 decision tree Methods 0.000 claims description 7
- 230000002159 abnormal effect Effects 0.000 claims description 6
- 230000002123 temporal effect Effects 0.000 claims description 6
- 238000012417 linear regression Methods 0.000 claims description 3
- 238000007477 logistic regression Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000001932 seasonal effect Effects 0.000 claims description 3
- 238000012731 temporal analysis Methods 0.000 claims description 3
- 238000000700 time series analysis Methods 0.000 claims description 3
- 230000006399 behavior Effects 0.000 abstract description 8
- 238000010801 machine learning Methods 0.000 abstract description 4
- 238000013459 approach Methods 0.000 abstract description 2
- 238000013480 data collection Methods 0.000 description 5
- 238000012913 prioritisation Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000012935 Averaging Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01R—MEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
- G01R22/00—Arrangements for measuring time integral of electric power or current, e.g. electricity meters
- G01R22/06—Arrangements for measuring time integral of electric power or current, e.g. electricity meters by electronic methods
- G01R22/061—Details of electronic electricity meters
- G01R22/066—Arrangements for avoiding or indicating fraudulent use
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/251—Fusion techniques of input or preprocessed data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Power Engineering (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of electricity consumption behavior classification Forecasting Methodology of three stages various visual angles Fusion Features, first to customer electricity data to be analyzed, as test set, and the missing data in daily power consumption, same day ammeter reading, proxima luce (prox. luc) ammeter reading is filled with " 1 " and " 0 " respectively, form two parts of preprocessed datas;Secondly, to every part of preprocessed data, feature is extracted with different view, the feature that all visual angles are extracted is merged, processed using the machine learning algorithm of multiple different classification predictions, draw the stealing probability of the client in training set and test set;Finally, the output respectively with linear model and tree-model to second stage is predicted, and then averages, and obtains the stealing probability finally to be predicted.The present invention increased the treatment of the diversity of data, the diversity of model and over-fitting, such that it is able to realize more accurately predicting client's stealing probability on the basis of the integrated learning approach of existing heap model.
Description
Technical field
The present invention relates to the machine learning method of customer electricity behavior classification prediction, more particularly to a kind of three stage is from various visual angles
The stealing classification Forecasting Methodology of Fusion Features.
Background technology
The development of social economy causes that society's electricity consumption amount increases year by year, is ordered about by interests, and client's exception electricity consumption is stealing
Phenomenon is also increasingly serious.Client's electricity filching behavior not only causes heavy economic losses to power supply enterprise, has also had a strong impact on normal
Confession electricity consumption order.Counted according to State Grid Corporation of China, in recent years because being lost up to up to ten million units caused by client's stealing.In recent years,
Client's stealing mode is also developed into the height of device intelligence, means specialization, behavior hiddenization, implement scale by barbarous stealing
Scientific and technological stealing, very big difficulty is further increased to work of electricity anti-stealing.With power system upgrade, intelligent power equipment it is general
And, grid company can be by big data point with the customer electricity behavioral data of real-time collecting magnanimity, power equipment Monitoring Data
Electricity filching behavior prediction of the analysis technology to carry out client provides the foundation.Realized to client's stealing probability by big data analytical technology
Prediction, can be analyzed with the thief-proof pyroelectric monitor of the development of science, improve work of electricity anti-stealing efficiency, reduce the time of electricity filching behavior analysis
And cost.
When the electricity consumption behavior to a large amount of clients is analyzed, because client's amount is huge, history electricity consumption data is lacked more
Seriously, existing machine learning method is faced with missing values treatment, feature extraction, feature selecting and Model Fusion etc. in treatment
The challenge of many aspects, it is not only high to computing resource requirement, and the spy to hundreds of dimensions, thousands of dimensions that requires a great deal of time
Levy and be combined and select.Meanwhile, single sorting algorithm is also difficult to obtain predicting the outcome for preferable client's stealing probability, because
This, research can better conform to shortage of data, and the method for reducing feature selection process and improving precision of prediction has very strong society
Can demand and very big economic worth.
The content of the invention
The technical problems to be solved by the invention are directed to involved defect in background technology, there is provided a kind of three stage
The stealing classification Forecasting Methodology of various visual angles Fusion Features.
The present invention uses following technical scheme to solve above-mentioned technical problem:
A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features, comprises the following steps:
Step 1), to customer electricity data to be analyzed, as test set, and to daily power consumption, same day ammeter reading, preceding
Missing data in ammeter reading on the one is filled with " -1 " and " 0 " respectively, forms two parts of preprocessed datas;
Step 2), to every part of preprocessed data:
Step 2.1), selected at least from time window statistics, abnormal sudden change Data-Statistics and these three visual angles of time series analysis
Feature is extracted at two visual angles, the set of the characteristic value that each visual angle is extracted as a single feature cluster, then extraction
To single feature cluster merge into a feature cluster, and feature cluster each single feature cluster and after merging is formed
Set as the preprocessed data feature gathering close;
Step 2.2), each the feature cluster in being closed to feature gathering, being used using the sorting algorithm of at least one two classification should
Feature cluster carries out stealing probabilistic forecasting to each client in training set, the test set of default customer electricity data respectively;
Step 3), for each client in training set and test set, it is predicted what is obtained in two parts of preprocessed datas
Each prediction stealing probability constitutes its prediction stealing Making by Probability Sets;
Step 4), using the prediction stealing Making by Probability Sets of all clients in training set and test set as feature, use tree classification respectively
Model and linear classification model are predicted to test set, obtain the final of each client in customer electricity data to be analyzed
Prediction stealing probability;
Step 5), in the customer electricity data being analysed to the final prediction stealing probability of each client respectively with default stealing
Probability threshold value is compared, and final prediction stealing probability is divided into stealing visitor more than the client of default stealing probability threshold value
Family, normal clients are divided into by final prediction stealing probability less than or equal to the client of default stealing probability threshold value.
As a kind of stealing classification further prioritization scheme of Forecasting Methodology of three stages various visual angles Fusion Features of the present invention,
The step 2.1)Three visual angles of middle selection are extracting detailed step during feature:
Step 2.1.1), each user is counted by the power consumption that carries out every month, and as time window feature cluster,
The power consumption statistics includes maximum, minimum value, average, mean square deviation and the root variance of power consumption,;
Step 2.1.2), statistics daily power consumption, the numerical value catastrophe of same day ammeter reading and proxima luce (prox. luc) ammeter reading, and by its
As Characteristics of Mutation cluster, the numerical value catastrophe include ammeter reading less than proxima luce (prox. luc) ammeter reading, daily power consumption missing,
Same day ammeter reading missing, proxima luce (prox. luc) ammeter reading missing and daily power consumption are the ammeter reading of negative;
Step 2.1.3), to each user in chronological order, daily power consumption is converted into time series, respectively extraction time sequence
Peak value number, trough number, average, quantile, seasonal trend, periodicity trend time series feature, it is special as sequential
Levy cluster;
Step 2.1.4), time window feature cluster, Characteristics of Mutation cluster and temporal aspect cluster are merged into a feature cluster;
Step 2.1.5), the set that the feature cluster after time window feature cluster, Characteristics of Mutation cluster, temporal aspect cluster and merging is formed
Closed as the feature gathering of preprocessed data.
As a kind of stealing classification further prioritization scheme of Forecasting Methodology of three stages various visual angles Fusion Features of the present invention,
The step 2.2)Detailed step be:
Each feature cluster in being closed to feature gathering, the sorting algorithm using at least one two classification is distinguished using this feature cluster
Stealing probabilistic forecasting is carried out to each client in training set, the test set of default customer electricity data;
Step 2.2.1), the data of training set are divided into N parts of training data by client's random sampling;
Step 2.2.2), for every part of training data:
As sub- checking collection, remaining N-1 parts of training data intersection as sub- training set, in being closed using feature gathering successively
Each feature cluster, predict the stealing of client in the training data and test set using the sorting techniques of at least one two classification
Probability;
Step 2.2.3), by step 2.2.2)In the default result of all training datas closed, obtain each visitor in training set
The predicted value of the stealing probability at family;
Step 2.2.4), to step 2.2.2)In the stealing probability per portion client in the corresponding test set of each part training data
Average, obtain the predicted value of the stealing probability of each client in test set.
As a kind of stealing classification further prioritization scheme of Forecasting Methodology of three stages various visual angles Fusion Features of the present invention,
Step 2.2.1)Middle use two classification sorting technique comprising XGBoost, LightGBM, Keras, Nerual Network,
Logistic Regression and Gradient Boost Decision Tree.
As a kind of stealing classification further prioritization scheme of Forecasting Methodology of three stages various visual angles Fusion Features of the present invention,
Step 4)Described in tree classification model be XGBoost, LightGBM, Keras, Nerual Network, Gradient
One kind in Boosting Decision Tree.
As a kind of stealing classification further prioritization scheme of Forecasting Methodology of three stages various visual angles Fusion Features of the present invention,
Step 4)Described in linear classification model for booster be set as gblinear XGBoost,
One kind in LogisticRegeression, Linear Regression.
The present invention uses above technical scheme compared with prior art, with following technique effect:
1. the method for the present invention can cause only consider the feature selection issues inside the feature set at single visual angle, it is to avoid
In existing method in the feature of thousands of dimensions to do feature selecting when required a large amount of computing resources and time resource;
2., relative to existing machine learning method or integrated learning approach, to there are a large amount of missings in the inventive method in reality
The data set of data is more effective, while by increasing the diversity of data set, the diversity of model and anti-over-fitting, can subtract
While few amount of calculation, precision of prediction is lifted;
3. the method for the present invention need not change the algorithm of existing customer's electricity consumption behavior classification prediction in implementation process, can be abundant
Realized using existing classification prediction algorithm.
Brief description of the drawings
Fig. 1 is the principle schematic of three stage various visual angles Fusion Features in the present invention.
Specific embodiment
Technical scheme is described in further detail below in conjunction with the accompanying drawings:
The invention discloses a kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features, comprise the following steps:
Step 1), to customer electricity data to be analyzed, as test set, and to daily power consumption, same day ammeter reading, preceding
Missing data in ammeter reading on the one is filled with " -1 " and " 0 " respectively, forms two parts of preprocessed datas;
Step 2), to every part of preprocessed data:
Step 2.1), selected at least from time window statistics, abnormal sudden change Data-Statistics and these three visual angles of time series analysis
Feature is extracted at two visual angles, the set of the characteristic value that each visual angle is extracted as a single feature cluster, then extraction
To single feature cluster merge into a feature cluster, and feature cluster each single feature cluster and after merging is formed
Set as the preprocessed data feature gathering close;
Step 2.2), each the feature cluster in being closed to feature gathering, being used using the sorting algorithm of at least one two classification should
Feature cluster carries out stealing probabilistic forecasting to each client in training set, the test set of default customer electricity data respectively;
Step 3), for each client in training set and test set, it is predicted what is obtained in two parts of preprocessed datas
Each prediction stealing probability constitutes its prediction stealing Making by Probability Sets;
Step 4), using the prediction stealing Making by Probability Sets of all clients in training set and test set as feature, use tree classification respectively
Model and linear classification model are predicted to test set, obtain the final pre- of each client in customer electricity data to be analyzed
Survey stealing probability;
Step 5), in the customer electricity data being analysed to the final prediction stealing probability of each client respectively with default stealing
Probability threshold value is compared, and final prediction stealing probability is divided into stealing visitor more than the client of default stealing probability threshold value
Family, normal clients are divided into by final prediction stealing probability less than or equal to the client of default stealing probability threshold value.
The step 2.1)Three visual angles of middle selection are extracting detailed step during feature:
Step 2.1.1), each user is counted by the power consumption that carries out every month, and as time window feature cluster,
The power consumption statistics includes maximum, minimum value, average, mean square deviation and the root variance of power consumption,;
Step 2.1.2), statistics daily power consumption, the numerical value catastrophe of same day ammeter reading and proxima luce (prox. luc) ammeter reading, and by its
As Characteristics of Mutation cluster, the numerical value catastrophe include ammeter reading less than proxima luce (prox. luc) ammeter reading, daily power consumption missing,
Same day ammeter reading missing, proxima luce (prox. luc) ammeter reading missing and daily power consumption are the ammeter reading of negative;
Step 2.1.3), to each user in chronological order, daily power consumption is converted into time series, respectively extraction time sequence
Peak value number, trough number, average, quantile, seasonal trend, periodicity trend time series feature, it is special as sequential
Levy cluster;
Step 2.1.4), time window feature cluster, Characteristics of Mutation cluster and temporal aspect cluster are merged into a feature cluster;
Step 2.1.5), the set that the feature cluster after time window feature cluster, Characteristics of Mutation cluster, temporal aspect cluster and merging is formed
Closed as the feature gathering of preprocessed data.
The step 2.2)Detailed step be:
Each feature cluster in being closed to feature gathering, the sorting algorithm using at least one two classification is distinguished using this feature cluster
Stealing probabilistic forecasting is carried out to each client in training set, the test set of default customer electricity data;
Step 2.2.1), the data of training set are divided into N parts of training data by client's random sampling;
Step 2.2.2), for every part of training data:
As sub- checking collection, remaining N-1 parts of training data intersection as sub- training set, in being closed using feature gathering successively
Each feature cluster, predict the stealing of client in the training data and test set using the sorting techniques of at least one two classification
Probability;
Step 2.2.3), by step 2.2.2)In the default result of all training datas closed, obtain each visitor in training set
The predicted value of the stealing probability at family;
Step 2.2.4), to step 2.2.2)In the stealing probability per portion client in the corresponding test set of each part training data
Average, obtain the predicted value of the stealing probability of each client in test set.
Step 2.2.1)The sorting technique of two classification of middle use includes XGBoost, LightGBM, Keras, Nerual
Network, Logistic Regression and Gradient Boost Decision Tree.
Step 4)Described in tree classification model for XGBoost, LightGBM, Keras, Nerual Network,
One kind in Gradient Boosting Decision Tree.
Step 4)Described in linear classification model for booster be set as gblinear XGBoost,
One kind in LogisticRegeression, Linear Regression.
As shown in figure 1, in being one embodiment of the present of invention, preprocessed data collection used is 2, extracts regarding for feature
Time window statistical nature and abnormal sudden change feature this 2 have only been selected in angle for simplicity, and the sorting algorithm of selection is 2, is done
Data are divided into 5 parts during Fusion Features(N=5).
The present embodiment comprises the following steps:
Step 1), to data to be predicted, the daily power consumption that will be lacked(KWH), same day ammeter reading(KWH_READING)With it is preceding
Ammeter reading on the one(KWH_READING1)- 1 and 0 is filled with respectively, produces two preprocessed files PD1 and PD2.
Step 2), PD1 and PD2 are distinguished from this 2 different visual angles of time window statistical nature and abnormal sudden change feature
Feature is extracted, intersection V1A, V21 of V11, V12, V21, V22, V11 and V12 and the intersection V2A of V22 is obtained:
Step 2.1), by custom partitioning after, the time is monthly divided into different time windows, count in each time window
The feature of day electricity consumption, including maximum, minimum value, intermediate value, average, 0 number, continuous 0 number, decile etc., as
Time window feature.To PD1 and PD2 difference extraction time window features, V11 and V21 is obtained;
Step 2.2), to custom partitioning after, temporally from small to large sort after, respectively count daily power consumption be negative, day electricity consumption
It is that 0, same day ammeter reading is less than proxima luce (prox. luc) ammeter reading etc. to measure, used as abnormal sudden change feature.Different is extracted respectively to PD1 and PD2
Normal Characteristics of Mutation, obtains V12 and V22;
Step 2.3), the feature set at multiple visual angles of PD1 is merged, will V11 and V12 merge, obtain feature intersection V1A;Will
The feature set at multiple visual angles of PD2 merges, will V21 and V22 merge, obtain feature intersection V2A;
Step 3), two kinds of different classification prediction algorithms are used to each feature set respectively, in prediction training set and test set
The stealing probability of client:
Step 3.1), to each feature set, training set is divided into 5 parts(N=5).
Step 3.2), any 4 parts of training datas are taken, with classification prediction algorithm training pattern, then predict a instruction in addition
Practice the stealing probability of client in data and test data;
Step 3.3), by step 3.2)In obtain the stealing probabilistic forecasting data of training data are merged, obtain to whole instruction
Practice the stealing probability for concentrating client;By step 3.2)In the stealing probabilistic forecasting value to client in test set that obtains be averaging,
Obtain the prediction probability to client's stealing in test set;
Step 3.4), step is used respectively to each feature set V11, V12, V1A, V21, V22, V2A with classification prediction algorithm M
3.1), step 3.2), step 3.3), the step of, obtain stealing prediction probability M11 to each feature set, M12, M1A,
M21、M22、M2A;With classification prediction algorithm N(N is different classification prediction algorithms with M)To each feature set V11, V12, V1A,
V21, V22, V2A use step 31 respectively), step 32), step 33)The step of, obtain the stealing prediction probability to each feature set
N11、N12、N1A、N21、N22、N2A;
Step 4), by step 3)To the stealing probabilistic forecasting value of client in training set as training set input feature vector, to test set
Stealing probabilistic forecasting value as test set input feature vector, respectively with the tree-model and the linear model of classification prediction of classification prediction
To predict the stealing probability of client in test set, and it is averaging to predicting the outcome, obtains final client's stealing and predict the outcome:
Step 4.1), by step 3)The prediction probability of the basic model for obtaining as feature, by M11, N11, M12, N12, M1A,
N1A, M21, N21, M22, N22, M2A, N2A are attached by major key of customer number, use linear classification algorithm
LogisticRegressionClassifier carries out classification prediction, obtains the predicted value to client's stealing probability in test set
LA;
Step 4.2), by step 3)The prediction probability of the basic model for obtaining as feature, by M11, N11, M12, N12, M1A,
N1A, M21, N21, M22, N22, M2A, N2A are attached by major key of customer number, are divided with tree classification algorithm XGBoost
Class prediction, obtains the predicted value TA to client's stealing probability in test set;
Step 4.3), by step 4.1)With step 4.2)Client's stealing probabilistic forecasting value average, as the final of most client
Prediction stealing probability R;
Step 5), in the customer electricity data being analysed to the final prediction stealing probability of each client respectively with default stealing
Probability threshold value is compared, and final prediction stealing probability is divided into stealing visitor more than the client of default stealing probability threshold value
Family, normal clients are divided into by final prediction stealing probability less than or equal to the client of default stealing probability threshold value.
General principle of the invention is:Missing values first to customer electricity data to be analyzed carry out different fillings,
Multiple different preprocessed data collection are produced, the diversity of data is increased so that follow-up feature extraction and model can be more
The implicit information of good utilization missing data.Secondly in characteristic extraction procedure, to each preprocessed data collection, from time window system
Construction feature collection is distinguished at meter, mutation Data-Statistics and time series feature etc. multiple visual angles, and by the spy of the extraction at multiple visual angles
Levy and merge into a feature set, this allows preferably to portray the feature of each preprocessed data collection the characteristic of data set,
Simultaneously because several feature sets are to go out to send structure with different view, the otherness between feature set is very big, it is to avoid feature
Between interfere, reduce the calculating process of feature selecting.Simultaneously as each preprocessed data collection, all building
The feature intersection of one feature cluster by multiple different visual angles, therefore can preferably merge the feature of multiple different visual angles
Collection, is conducive to final Model Fusion.In model construction process, using multiple existing main flow sorting algorithms, including
XGBoost, Gradient Boost Decision Tree, Neural Network scheduling algorithms, increased the diversity of algorithm,
So that the combination of algorithms of different preferably can from different angles portray the characteristic of data.Finally, using tree-model and linearly
The average of the prediction probability of model can preferably avoid the over-fitting problem of model as finally predicting the outcome.The above method
Prediction of more accurately classifying to client's stealing probability is realized with smaller resource, with more preferable practical engineering application value.
Those skilled in the art of the present technique it is understood that unless otherwise defined, all terms used herein(Including skill
Art term and scientific terminology)With with art of the present invention in those of ordinary skill general understanding identical meaning.Also
It should be understood that those terms defined in such as general dictionary should be understood that with the context of prior art in
The consistent meaning of meaning, and unless defined as here, will not be explained with idealization or excessively formal implication.
Above-described specific embodiment, has been carried out further to the purpose of the present invention, technical scheme and beneficial effect
Describe in detail, should be understood that and the foregoing is only specific embodiment of the invention, be not limited to this hair
Bright, all any modification, equivalent substitution and improvements within the spirit and principles in the present invention, done etc. should be included in the present invention
Protection domain within.
Claims (6)
1. a kind of three stages various visual angles Fusion Features stealing classification Forecasting Methodology, it is characterised in that comprise the following steps:
Step 1), to customer electricity data to be analyzed, as test set, and to daily power consumption, same day ammeter reading, preceding
Missing data in ammeter reading on the one is filled with " -1 " and " 0 " respectively, forms two parts of preprocessed datas;
Step 2), to every part of preprocessed data:
Step 2.1), selected at least from time window statistics, abnormal sudden change Data-Statistics and these three visual angles of time series analysis
Feature is extracted at two visual angles, the set of the characteristic value that each visual angle is extracted as a single feature cluster, then extraction
To single feature cluster merge into a feature cluster, and feature cluster each single feature cluster and after merging is formed
Set as the preprocessed data feature gathering close;
Step 2.2), each the feature cluster in being closed to feature gathering, being used using the sorting algorithm of at least one two classification should
Feature cluster carries out stealing probabilistic forecasting to each client in training set, the test set of default customer electricity data respectively;
Step 3), for each client in training set and test set, it is predicted what is obtained in two parts of preprocessed datas
Each prediction stealing probability constitutes its prediction stealing Making by Probability Sets;
Step 4), using the prediction stealing Making by Probability Sets of all clients in training set and test set as feature, use tree classification respectively
Model and linear classification model are predicted to test set, and two for obtaining prediction probability value is averaged, and obtains to be analyzed
The final prediction stealing probability of each client in customer electricity data;
Step 5), in the customer electricity data being analysed to the final prediction stealing probability of each client respectively with default stealing
Probability threshold value is compared, and final prediction stealing probability is divided into stealing visitor more than the client of default stealing probability threshold value
Family, normal clients are divided into by final prediction stealing probability less than or equal to the client of default stealing probability threshold value.
2. the stealing classification Forecasting Methodology of a kind of three stages various visual angles Fusion Features as claimed in claim 1, it is characterised in that
The step 2.1)Three visual angles of middle selection are extracting detailed step during feature:
Step 2.1.1), each user is counted by the power consumption that carries out every month, and as time window feature cluster,
The power consumption statistics includes maximum, minimum value, average, mean square deviation and the root variance of power consumption;
Step 2.1.2), statistics daily power consumption, the numerical value catastrophe of same day ammeter reading and proxima luce (prox. luc) ammeter reading, and by its
As Characteristics of Mutation cluster, the numerical value catastrophe include ammeter reading less than proxima luce (prox. luc) ammeter reading, daily power consumption missing,
Same day ammeter reading missing, proxima luce (prox. luc) ammeter reading missing and daily power consumption are the ammeter reading of negative;
Step 2.1.3), to each user in chronological order, daily power consumption is converted into time series, respectively extraction time sequence
Peak value number, trough number, average, quantile, seasonal trend, periodicity trend time series feature, it is special as sequential
Levy cluster;
Step 2.1.4), time window feature cluster, Characteristics of Mutation cluster and temporal aspect cluster are merged into a feature cluster;
Step 2.1.5), the set that the feature cluster after time window feature cluster, Characteristics of Mutation cluster, temporal aspect cluster and merging is formed
Closed as the feature gathering of preprocessed data.
3. the stealing classification Forecasting Methodology of a kind of three stages various visual angles Fusion Features as claimed in claim 2, it is characterised in that
The step 2.2)Detailed step be:
Each feature cluster in being closed to feature gathering, the sorting algorithm using at least one two classification is distinguished using this feature cluster
Stealing probabilistic forecasting is carried out to each client in training set, the test set of default customer electricity data;
Step 2.2.1), the data of training set are divided into N parts of training data by client's random sampling;
Step 2.2.2), for every part of training data:
As sub- checking collection, remaining N-1 parts of training data intersection as sub- training set, in being closed using feature gathering successively
Each feature cluster, predict the stealing of client in the sub- checking collection and test set using the sorting techniques of at least one two classification
Probability;
Step 2.2.3), by step 2.2.2)In the default result of all training datas closed, obtain each visitor in training set
The predicted value of the stealing probability at family;
Step 2.2.4), to step 2.2.2)In the stealing probability per portion client in the corresponding test set of each part training data
Average, obtain the predicted value of the stealing probability of each client in test set.
4. the stealing classification Forecasting Methodology of a kind of three stages various visual angles Fusion Features as claimed in claim 3, it is characterised in that
Step 2.2.1)Middle use two classification sorting technique comprising XGBoost, LightGBM, Keras, Nerual Network,
Logistic Regression and Gradient Boost Decision Tree.
5. the stealing classification Forecasting Methodology of a kind of three stages various visual angles Fusion Features as claimed in claim 3, it is characterised in that
Step 4)Described in tree classification model be XGBoost, LightGBM, Keras, Nerual Network, Gradient
One kind in Boosting Decision Tree.
6. the stealing classification Forecasting Methodology of a kind of three stages various visual angles Fusion Features as claimed in claim 3, it is characterised in that
Step 4)Described in linear classification model for booster be set as gblinear XGBoost,
One kind in LogisticRegeression, Linear Regression.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710036718.XA CN106909933B (en) | 2017-01-18 | 2017-01-18 | A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710036718.XA CN106909933B (en) | 2017-01-18 | 2017-01-18 | A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106909933A true CN106909933A (en) | 2017-06-30 |
CN106909933B CN106909933B (en) | 2018-05-18 |
Family
ID=59206516
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710036718.XA Expired - Fee Related CN106909933B (en) | 2017-01-18 | 2017-01-18 | A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106909933B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107492043A (en) * | 2017-09-04 | 2017-12-19 | 国网冀北电力有限公司电力科学研究院 | stealing analysis method and device |
CN107862347A (en) * | 2017-12-04 | 2018-03-30 | 国网山东省电力公司济南供电公司 | A kind of discovery method of the electricity stealing based on random forest |
CN108490288A (en) * | 2018-03-09 | 2018-09-04 | 华南师范大学 | A kind of stealing detection method and system |
CN108961215A (en) * | 2018-06-05 | 2018-12-07 | 上海大学 | Parkinson's disease assistant diagnosis system and method based on Multimodal medical image |
CN109359674A (en) * | 2018-09-27 | 2019-02-19 | 智庭(北京)智能科技有限公司 | A kind of smart lock method for detecting abnormality based on multi-model blending |
CN109858679A (en) * | 2018-12-30 | 2019-06-07 | 国网浙江省电力有限公司 | A kind of opposing electricity-stealing for the man-machine object of combination checks monitoring system and its working method |
CN110119755A (en) * | 2019-03-22 | 2019-08-13 | 国网浙江省电力有限公司信息通信分公司 | Electricity method for detecting abnormality based on Ensemble learning model |
CN111507507A (en) * | 2020-03-24 | 2020-08-07 | 重庆森鑫炬科技有限公司 | Big data-based monthly water consumption prediction method |
CN112101420A (en) * | 2020-08-17 | 2020-12-18 | 广东工业大学 | Abnormal electricity user identification method for Stacking integration algorithm under dissimilar model |
CN112232985A (en) * | 2020-10-15 | 2021-01-15 | 国网天津市电力公司 | Power distribution and utilization data monitoring method and device for ubiquitous power Internet of things |
CN112485491A (en) * | 2020-11-23 | 2021-03-12 | 国网北京市电力公司 | Power stealing identification method and device |
CN112561569A (en) * | 2020-12-07 | 2021-03-26 | 上海明略人工智能(集团)有限公司 | Dual-model-based arrival prediction method and system, electronic device and storage medium |
CN113128567A (en) * | 2021-03-25 | 2021-07-16 | 云南电网有限责任公司 | Abnormal electricity consumption behavior identification method based on electricity consumption data |
CN113435513A (en) * | 2021-06-28 | 2021-09-24 | 平安科技(深圳)有限公司 | Insurance client grouping method, device, equipment and medium based on deep learning |
CN116933986A (en) * | 2023-09-19 | 2023-10-24 | 国网湖北省电力有限公司信息通信公司 | Electric power data safety management system based on deep learning |
CN116954591A (en) * | 2023-06-15 | 2023-10-27 | 天云融创数据科技(北京)有限公司 | Generalized linear model training method, device, equipment and medium in banking field |
CN117033916A (en) * | 2023-07-10 | 2023-11-10 | 国网四川省电力公司营销服务中心 | Power theft detection method based on neural network |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070136082A1 (en) * | 2005-12-14 | 2007-06-14 | Southern Company Services, Inc. | System and method for energy diversion investigation management |
CN102866321A (en) * | 2012-08-13 | 2013-01-09 | 广东电网公司电力科学研究院 | Self-adaptive stealing-leakage prevention diagnosis method |
CN103778567A (en) * | 2014-01-21 | 2014-05-07 | 深圳供电局有限公司 | Method and system for discriminating abnormal electricity utilization of user |
CN105069476A (en) * | 2015-08-10 | 2015-11-18 | 国网宁夏电力公司 | Method for identifying abnormal wind power data based on two-stage integration learning |
-
2017
- 2017-01-18 CN CN201710036718.XA patent/CN106909933B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070136082A1 (en) * | 2005-12-14 | 2007-06-14 | Southern Company Services, Inc. | System and method for energy diversion investigation management |
CN102866321A (en) * | 2012-08-13 | 2013-01-09 | 广东电网公司电力科学研究院 | Self-adaptive stealing-leakage prevention diagnosis method |
CN103778567A (en) * | 2014-01-21 | 2014-05-07 | 深圳供电局有限公司 | Method and system for discriminating abnormal electricity utilization of user |
CN105069476A (en) * | 2015-08-10 | 2015-11-18 | 国网宁夏电力公司 | Method for identifying abnormal wind power data based on two-stage integration learning |
Non-Patent Citations (2)
Title |
---|
TANG YIJIA ET AL: "Anomaly detection of power Consumption based on waveform feature recognition", 《THE 11TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE&EDUCATION 》 * |
陈文瑛 等: "应用大数据技术的反窃电分析", 《电子测量与仪器学报》 * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107492043A (en) * | 2017-09-04 | 2017-12-19 | 国网冀北电力有限公司电力科学研究院 | stealing analysis method and device |
CN107862347A (en) * | 2017-12-04 | 2018-03-30 | 国网山东省电力公司济南供电公司 | A kind of discovery method of the electricity stealing based on random forest |
CN108490288A (en) * | 2018-03-09 | 2018-09-04 | 华南师范大学 | A kind of stealing detection method and system |
CN108490288B (en) * | 2018-03-09 | 2019-04-16 | 华南师范大学 | A kind of stealing detection method and system |
CN108961215A (en) * | 2018-06-05 | 2018-12-07 | 上海大学 | Parkinson's disease assistant diagnosis system and method based on Multimodal medical image |
CN109359674A (en) * | 2018-09-27 | 2019-02-19 | 智庭(北京)智能科技有限公司 | A kind of smart lock method for detecting abnormality based on multi-model blending |
CN109858679A (en) * | 2018-12-30 | 2019-06-07 | 国网浙江省电力有限公司 | A kind of opposing electricity-stealing for the man-machine object of combination checks monitoring system and its working method |
CN110119755A (en) * | 2019-03-22 | 2019-08-13 | 国网浙江省电力有限公司信息通信分公司 | Electricity method for detecting abnormality based on Ensemble learning model |
CN111507507A (en) * | 2020-03-24 | 2020-08-07 | 重庆森鑫炬科技有限公司 | Big data-based monthly water consumption prediction method |
CN112101420A (en) * | 2020-08-17 | 2020-12-18 | 广东工业大学 | Abnormal electricity user identification method for Stacking integration algorithm under dissimilar model |
CN112232985A (en) * | 2020-10-15 | 2021-01-15 | 国网天津市电力公司 | Power distribution and utilization data monitoring method and device for ubiquitous power Internet of things |
CN112232985B (en) * | 2020-10-15 | 2023-02-28 | 国网天津市电力公司 | Power distribution and utilization data monitoring method and device for ubiquitous power Internet of things |
CN112485491A (en) * | 2020-11-23 | 2021-03-12 | 国网北京市电力公司 | Power stealing identification method and device |
CN112561569A (en) * | 2020-12-07 | 2021-03-26 | 上海明略人工智能(集团)有限公司 | Dual-model-based arrival prediction method and system, electronic device and storage medium |
CN112561569B (en) * | 2020-12-07 | 2024-02-27 | 上海明略人工智能(集团)有限公司 | Dual-model-based store arrival prediction method, system, electronic equipment and storage medium |
CN113128567A (en) * | 2021-03-25 | 2021-07-16 | 云南电网有限责任公司 | Abnormal electricity consumption behavior identification method based on electricity consumption data |
CN113435513A (en) * | 2021-06-28 | 2021-09-24 | 平安科技(深圳)有限公司 | Insurance client grouping method, device, equipment and medium based on deep learning |
CN113435513B (en) * | 2021-06-28 | 2024-06-04 | 平安科技(深圳)有限公司 | Deep learning-based insurance customer grouping method, device, equipment and medium |
CN116954591A (en) * | 2023-06-15 | 2023-10-27 | 天云融创数据科技(北京)有限公司 | Generalized linear model training method, device, equipment and medium in banking field |
CN116954591B (en) * | 2023-06-15 | 2024-02-23 | 天云融创数据科技(北京)有限公司 | Generalized linear model training method, device, equipment and medium in banking field |
CN117033916A (en) * | 2023-07-10 | 2023-11-10 | 国网四川省电力公司营销服务中心 | Power theft detection method based on neural network |
CN116933986A (en) * | 2023-09-19 | 2023-10-24 | 国网湖北省电力有限公司信息通信公司 | Electric power data safety management system based on deep learning |
CN116933986B (en) * | 2023-09-19 | 2024-01-23 | 国网湖北省电力有限公司信息通信公司 | Electric power data safety management system based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN106909933B (en) | 2018-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106909933A (en) | A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features | |
CN110232203B (en) | Knowledge distillation optimization RNN short-term power failure prediction method, storage medium and equipment | |
CN111738462B (en) | Fault first-aid repair active service early warning method for electric power metering device | |
CN110082699A (en) | A kind of low-voltage platform area intelligent electric energy meter kinematic error calculation method and its system | |
CN113177357B (en) | Transient stability assessment method for power system | |
CN104537433A (en) | Sold electricity quantity prediction method based on inventory capacities and business expansion characteristics | |
CN111241755A (en) | Power load prediction method | |
CN106779219A (en) | A kind of electricity demand forecasting method and system | |
CN111368904A (en) | Electrical equipment identification method based on electric power fingerprint | |
CN112396234A (en) | User side load probability prediction method based on time domain convolutional neural network | |
CN111582548A (en) | Power load prediction method based on multivariate user behavior portrait | |
CN113780684A (en) | Intelligent building user energy consumption behavior prediction method based on LSTM neural network | |
CN115688993A (en) | Short-term power load prediction method suitable for power distribution station area | |
CN114118588A (en) | Peak-facing summer power failure prediction method based on game feature extraction under clustering undersampling | |
CN113902062A (en) | Transformer area line loss abnormal reason analysis method and device based on big data | |
CN114611738A (en) | Load prediction method based on user electricity consumption behavior analysis | |
CN113762591B (en) | Short-term electric quantity prediction method and system based on GRU and multi-core SVM countermeasure learning | |
Guan et al. | Customer load forecasting method based on the industry electricity consumption behavior portrait | |
CN107093005A (en) | The method that tax handling service hall's automatic classification is realized based on big data mining algorithm | |
CN112508254B (en) | Method for determining investment prediction data of transformer substation engineering project | |
CN108830405B (en) | Real-time power load prediction system and method based on multi-index dynamic matching | |
CN114021425A (en) | Power system operation data modeling and feature selection method and device, electronic equipment and storage medium | |
Wang et al. | Cloud computing and extreme learning machine for a distributed energy consumption forecasting in equipment-manufacturing enterprises | |
CN114676931B (en) | Electric quantity prediction system based on data center technology | |
CN114298413A (en) | Hydroelectric generating set runout trend prediction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210416 Address after: 210046, 66 new model street, Gulou District, Jiangsu, Nanjing Patentee after: NANJING University OF POSTS AND TELECOMMUNICATIONS Patentee after: STATE GRID ELECTRIC POWER RESEARCH INSTITUTE Co.,Ltd. Address before: 210046, 66 new model street, Gulou District, Jiangsu, Nanjing Patentee before: NANJING University OF POSTS AND TELECOMMUNICATIONS |
|
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180518 |