CN106909933B - A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features - Google Patents
A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features Download PDFInfo
- Publication number
- CN106909933B CN106909933B CN201710036718.XA CN201710036718A CN106909933B CN 106909933 B CN106909933 B CN 106909933B CN 201710036718 A CN201710036718 A CN 201710036718A CN 106909933 B CN106909933 B CN 106909933B
- Authority
- CN
- China
- Prior art keywords
- stealing
- feature
- client
- cluster
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 230000000007 visual effect Effects 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 30
- 230000004927 fusion Effects 0.000 title claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 47
- 238000012360 testing method Methods 0.000 claims abstract description 37
- 230000005611 electricity Effects 0.000 claims abstract description 33
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 26
- 241001269238 Data Species 0.000 claims abstract description 14
- 238000000605 extraction Methods 0.000 claims abstract description 13
- 230000035772 mutation Effects 0.000 claims description 10
- 238000013145 classification model Methods 0.000 claims description 9
- 230000002159 abnormal effect Effects 0.000 claims description 7
- 238000003066 decision tree Methods 0.000 claims description 7
- 238000007477 logistic regression Methods 0.000 claims description 6
- 230000002123 temporal effect Effects 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 4
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 238000012417 linear regression Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000001932 seasonal effect Effects 0.000 claims description 3
- 238000012731 temporal analysis Methods 0.000 claims description 3
- 238000000700 time series analysis Methods 0.000 claims description 3
- 230000006399 behavior Effects 0.000 abstract description 5
- 238000010801 machine learning Methods 0.000 abstract description 4
- 238000012545 processing Methods 0.000 abstract description 3
- 238000013459 approach Methods 0.000 abstract description 2
- 238000013480 data collection Methods 0.000 description 5
- 238000012913 prioritisation Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01R—MEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
- G01R22/00—Arrangements for measuring time integral of electric power or current, e.g. electricity meters
- G01R22/06—Arrangements for measuring time integral of electric power or current, e.g. electricity meters by electronic methods
- G01R22/061—Details of electronic electricity meters
- G01R22/066—Arrangements for avoiding or indicating fraudulent use
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/251—Fusion techniques of input or preprocessed data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Power Engineering (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of electricity consumption behavior classification Forecasting Methodologies of three stages various visual angles Fusion Features, first to customer electricity data to be analyzed, as test set, and the missing data in daily power consumption, same day ammeter reading, proxima luce (prox. luc) ammeter reading is filled respectively with " 1 " and " 0 ", form two parts of preprocessed datas;Secondly, to every part of preprocessed data, feature is extracted with different view, the feature of all visual angles extraction is merged, the machine learning algorithm predicted using multiple and different classification is handled, and draws the stealing probability of the client in training set and test set;Finally, the output of second stage is predicted with linear model and tree-model respectively, then averaged, obtain the stealing probability finally to be predicted.The present invention adds the diversity of data, the diversity of model and over-fitting processing, client's stealing probability is more accurately predicted so as to realize on the basis of the integrated learning approach of existing heap model.
Description
Technical field
Machine learning method more particularly to a kind of three stages various visual angles the present invention relates to customer electricity behavior classification prediction
The stealing classification Forecasting Methodology of Fusion Features.
Background technology
The development of social economy causes society's electricity consumption amount to increase year by year, is driven by interests, client's exception electricity consumption, that is, stealing
Phenomenon also getting worse.Client's electricity stealing not only causes heavy economic losses to power supply enterprise, has also seriously affected normal
Supply electricity consumption order.It is counted according to State Grid Corporation of China, loss reaches up to ten million members caused by client's stealing in recent years.In recent years,
Client's stealing mode also by barbarous stealing develop to device intelligence, means specialization, behavior hiddenization, implement scale height
Scientific and technological stealing further increases very big difficulty to work of electricity anti-stealing.As electric system upgrades, intelligent power equipment it is general
And grid company can be with the customer electricity behavioral data of real-time collecting magnanimity, power equipment monitoring data, to pass through big data point
Analysis technology provides the foundation to carry out the electricity stealing of client prediction.It is realized by big data analysis technology to client's stealing probability
Prediction, can be analyzed with the thief-proof pyroelectric monitor of the development of science, improve work of electricity anti-stealing efficiency, reduce the time of electricity stealing analysis
And cost.
When analyzing the electricity consumption behavior of a large amount of clients, since client's amount is huge, history electricity consumption data lacks more
Seriously, existing machine learning method is faced with missing values processing, feature extraction, feature selecting and Model Fusion etc. in processing
The challenge of many aspects not only requires height to computing resource, but also the spy to hundreds of dimensions, thousands of dimensions that requires a great deal of time
Sign is combined and selects.Meanwhile single sorting algorithm is also difficult to obtain the prediction result of preferable client's stealing probability, because
This, research can better conform to shortage of data, and the method for reducing feature selection process and improving precision of prediction has very strong society
It can demand and very big economic value.
The content of the invention
The technical problems to be solved by the invention are to be directed to the defects of involved in background technology, provide a kind of three stages
The stealing classification Forecasting Methodology of various visual angles Fusion Features.
The present invention uses following technical scheme to solve above-mentioned technical problem:
A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features, includes the following steps:
Step 1), to customer electricity data to be analyzed, read as test set, and to daily power consumption, same day ammeter
Missing data in number, proxima luce (prox. luc) ammeter reading is filled respectively with " -1 " and " 0 ", forms two parts of preprocessed datas;
Step 2), to every part of preprocessed data:
Step 2.1), selected from time window statistics, abnormal sudden change Data-Statistics and these three visual angles of time series analysis
Feature is extracted at least two visual angles, the set of the characteristic value of each visual angle extraction as an individual feature cluster, then
The individual feature cluster extracted merges into a feature cluster, and the feature cluster after each individual feature cluster and merging
The set of formation is closed as the feature gathering of the preprocessed data;
Step 2.2), to each feature cluster in the conjunction of feature gathering, the sorting algorithm using at least one two classification makes
It is pre- that stealing probability is carried out with each client in this feature cluster respectively training set to default customer electricity data, test set
It surveys;
Step 3), for each client in training set and test set, it is measured in advance in two parts of preprocessed datas
To each prediction stealing probability form its predict stealing Making by Probability Sets;
Step 4), using the prediction stealing Making by Probability Sets of all clients in training set and test set as feature, respectively with tree
Disaggregated model and linear classification model predict test set, obtain each client in customer electricity data to be analyzed
Final prediction stealing probability;
Step 5), in the customer electricity data being analysed to the final prediction stealing probability of each client respectively with it is default
Stealing probability threshold value is compared, and the client that final prediction stealing probability is more than default stealing probability threshold value is divided into stealing
The client that final prediction stealing probability is less than or equal to default stealing probability threshold value is divided into normal clients by client.
Stealing as a kind of three stages various visual angles Fusion Features of the present invention is classified the further prioritization scheme of Forecasting Methodology,
The step 2.1)It is middle select three visual angles come detailed step when extracting feature for:
Step 2.1.1), the power consumption carried out to each client by every month counts, and as time window feature cluster,
The power consumption statistics includes maximum, minimum value, average and the root variance of power consumption,;
Step 2.1.2), the numerical value catastrophe of statistics daily power consumption, same day ammeter reading and proxima luce (prox. luc) ammeter reading, and
As Characteristics of Mutation cluster, the numerical value catastrophe includes ammeter reading, the daily power consumption less than proxima luce (prox. luc) ammeter reading
Missing, same day ammeter reading missing, the ammeter reading that proxima luce (prox. luc) ammeter reading lacks and daily power consumption is negative;
Step 2.1.3), to each client in chronological order, daily power consumption is converted into time series, respectively extraction time
The peak value number of sequence, trough number, average, quantile, seasonal trend, periodical trend time series feature, as when
Sequence characteristics cluster;
Step 2.1.4), time window feature cluster, Characteristics of Mutation cluster and temporal aspect cluster are merged into a feature cluster;
Step 2.1.5), the feature cluster after time window feature cluster, Characteristics of Mutation cluster, temporal aspect cluster and merging is formed
Gather and closed as the feature gathering of preprocessed data.
Stealing as a kind of three stages various visual angles Fusion Features of the present invention is classified the further prioritization scheme of Forecasting Methodology,
The step 2.2)Detailed step be:
Each feature cluster in being closed to feature gathering uses this feature cluster using the sorting algorithm of at least one two classification
The training set to default customer electricity data, each client in test set carry out stealing probabilistic forecasting respectively;
Step 2.2.1), the data of training set are divided into N parts of training datas by client's random sampling;
Step 2.2.2), for every part of training data:
As sub- verification collection, remaining N-1 parts of training data intersection as sub- training set, successively using feature gathering
Each feature cluster in conjunction predicts client in the training data and test set using the sorting algorithm of at least one two classification
Stealing probability;
Step 2.2.3), by step 2.2.2)In the prediction results of all training datas merge, obtain in training set
The predicted value of the stealing probability of each client;
Step 2.2.4), to step 2.2.2)In the stealing per a client in the corresponding test set of each part training data
Probability is averaged, and obtains the predicted value of the stealing probability of each client in test set.
Stealing as a kind of three stages various visual angles Fusion Features of the present invention is classified the further prioritization scheme of Forecasting Methodology,
Step 2.2)Middle use two classification sorting algorithm include XGBoost, LightGBM, Keras, Nerual Network,
Logistic Regression and Gradient Boost Decision Tree.
Stealing as a kind of three stages various visual angles Fusion Features of the present invention is classified the further prioritization scheme of Forecasting Methodology,
Step 4)Described in tree classification model for XGBoost, LightGBM, Keras, Nerual Network, Gradient
One kind in Boosting Decision Tree.
Stealing as a kind of three stages various visual angles Fusion Features of the present invention is classified the further prioritization scheme of Forecasting Methodology,
Step 4)Described in linear classification model for booster be set as gblinear XGBoost,
One kind in LogisticRegression, Linear Regression.
The present invention compared with prior art, has following technique effect using above technical scheme:
1. the method for the present invention can only consider the feature selection issues inside the feature set at single visual angle, keep away
Exempt from existing method in the feature of thousands of dimensions come a large amount of computing resources and time resource required when doing feature selecting;
2. compared with existing machine learning method or integrated learning approach, the method for the present invention is a large amount of to existing in reality
The data set of missing data is more effective, while by increasing the diversity of data set, the diversity of model and anti-over-fitting, can be with
While calculation amount is reduced, precision of prediction is promoted;
The algorithm of prediction 3. the method for the present invention is classified during realization it is not necessary to modify existing customer's electricity consumption behavior, can
Existing classification prediction algorithm is made full use of to realize.
Description of the drawings
Fig. 1 is the principle schematic of three stage various visual angles Fusion Features in the present invention.
Specific embodiment
Technical scheme is described in further detail below in conjunction with the accompanying drawings:
The invention discloses a kind of stealing classification Forecasting Methodologies of three stages various visual angles Fusion Features, include the following steps:
Step 1), to customer electricity data to be analyzed, read as test set, and to daily power consumption, same day ammeter
Missing data in number, proxima luce (prox. luc) ammeter reading is filled respectively with " -1 " and " 0 ", forms two parts of preprocessed datas;
Step 2), to every part of preprocessed data:
Step 2.1), selected from time window statistics, abnormal sudden change Data-Statistics and these three visual angles of time series analysis
Feature is extracted at least two visual angles, the set of the characteristic value of each visual angle extraction as an individual feature cluster, then
The individual feature cluster extracted merges into a feature cluster, and the feature cluster after each individual feature cluster and merging
The set of formation is closed as the feature gathering of the preprocessed data;
Step 2.2), to each feature cluster in the conjunction of feature gathering, the sorting algorithm using at least one two classification makes
It is pre- that stealing probability is carried out with each client in this feature cluster respectively training set to default customer electricity data, test set
It surveys;
Step 3), for each client in training set and test set, it is measured in advance in two parts of preprocessed datas
To each prediction stealing probability form its predict stealing Making by Probability Sets;
Step 4), using the prediction stealing Making by Probability Sets of all clients in training set and test set as feature, respectively with tree
Disaggregated model and linear classification model predict test set, obtains in customer electricity data to be analyzed each client most
Prediction stealing probability eventually;
Step 5), in the customer electricity data being analysed to the final prediction stealing probability of each client respectively with it is default
Stealing probability threshold value is compared, and the client that final prediction stealing probability is more than default stealing probability threshold value is divided into stealing
The client that final prediction stealing probability is less than or equal to default stealing probability threshold value is divided into normal clients by client.
The step 2.1)It is middle select three visual angles come detailed step when extracting feature for:
Step 2.1.1), each client is counted by the power consumption of every month, and as time window feature cluster,
The power consumption statistics includes maximum, minimum value, average and the root variance of power consumption,;
Step 2.1.2), the numerical value catastrophe of statistics daily power consumption, same day ammeter reading and proxima luce (prox. luc) ammeter reading, and
As Characteristics of Mutation cluster, the numerical value catastrophe includes ammeter reading, the daily power consumption less than proxima luce (prox. luc) ammeter reading
Missing, same day ammeter reading missing, the ammeter reading that proxima luce (prox. luc) ammeter reading lacks and daily power consumption is negative;
Step 2.1.3), to each client in chronological order, daily power consumption is converted into time series, respectively extraction time
The peak value number of sequence, trough number, average, quantile, seasonal trend, periodical trend time series feature, as when
Sequence characteristics cluster;
Step 2.1.4), time window feature cluster, Characteristics of Mutation cluster and temporal aspect cluster are merged into a feature cluster;
Step 2.1.5), the feature cluster after time window feature cluster, Characteristics of Mutation cluster, temporal aspect cluster and merging is formed
Gather and closed as the feature gathering of preprocessed data.
The step 2.2)Detailed step be:
Each feature cluster in being closed to feature gathering uses this feature cluster using the sorting algorithm of at least one two classification
The training set to default customer electricity data, each client in test set carry out stealing probabilistic forecasting respectively;
Step 2.2.1), the data of training set are divided into N parts of training datas by client's random sampling;
Step 2.2.2), for every part of training data:
As sub- verification collection, remaining N-1 parts of training data intersection as sub- training set, successively using feature gathering
Each feature cluster in conjunction predicts client in the training data and test set using the sorting algorithm of at least one two classification
Stealing probability;
Step 2.2.3), by step 2.2.2)In the prediction results of all training datas merge, obtain in training set
The predicted value of the stealing probability of each client;
Step 2.2.4), to step 2.2.2)In the stealing per a client in the corresponding test set of each part training data
Probability is averaged, and obtains the predicted value of the stealing probability of each client in test set.
Step 2.2)The sorting algorithm of two classification of middle use includes XGBoost, LightGBM, Keras, Nerual
Network, Logistic Regression and Gradient Boost Decision Tree.
Step 4)Described in tree classification model for XGBoost, LightGBM, Keras, Nerual Network,
One kind in Gradient Boosting Decision Tree.
Step 4)Described in linear classification model for booster be set as gblinear XGBoost,
One kind in LogisticRegression, Linear Regression.
As shown in Figure 1, it is in one embodiment of the present of invention, preprocessed data collection used is 2, extracts regarding for feature
Time window statistical nature and abnormal sudden change feature this 2 have only been selected in angle for simplicity, and the sorting algorithm of selection is 2, is done
Data are divided into 5 parts during Fusion Features(N=5).
The present embodiment comprises the following steps:
Step 1), to data to be predicted, by the daily power consumption of missing(KWH), same day ammeter reading(KWH_READING)
With proxima luce (prox. luc) ammeter reading(KWH_READING1)- 1 and 0 are filled with respectively, generate two preprocessed files PD1 and PD2.
Step 2), PD1 and PD2 is distinguished from time window statistical nature and this 2 different visual angles of abnormal sudden change feature
Feature is extracted, obtains the intersection V2A of intersection V1A, V21 and V22 of V11, V12, V21, V22, V11 and V12:
Step 2.1), after custom partitioning, be monthly divided into different time windows the time, count each time window
The feature of interior day electricity consumption, including maximum, minimum value, intermediate value, average, 0 number, continuous 0 number, decile etc.,
As time window feature.To PD1 and PD2 difference extraction time window features, V11 and V21 are obtained;
Step 2.2), after custom partitioning, after temporally sorting from small to large, daily power consumption is counted respectively as negative, day
Power consumption is 0, same day ammeter reading is less than proxima luce (prox. luc) ammeter reading etc., as abnormal sudden change feature.PD1 and PD2 are carried respectively
Abnormal sudden change feature is taken, obtains V12 and V22;
Step 2.3), the feature set at multiple visual angles of PD1 is merged, i.e., V11 and V12 is merged, obtains feature intersection
V1A;The feature set at multiple visual angles of PD2 is merged, i.e., V21 and V22 is merged, obtains feature intersection V2A;
Step 3), training set and test set are predicted using two kinds of different classification prediction algorithms to each feature set respectively
In client stealing probability:
Step 3.1), to each feature set, training set is divided into 5 parts(N=5).
Step 3.2), arbitrary 4 parts of training datas are taken, with classification prediction algorithm training pattern, then prediction is in addition a instructs
Practice the stealing probability of client in data and test data;
Step 3.3), by step 3.2)In obtain the stealing probabilistic forecasting data of training data are merged, obtain to whole
The stealing probability of client in a training set;By step 3.2)In obtain flat is asked to the stealing probabilistic forecasting value of client in test set
, the prediction probability to client's stealing in test set is obtained;
Step 3.4), step is used respectively to each feature set V11, V12, V1A, V21, V22, V2A with classification prediction algorithm M
3.1), step 3.2), step 3.3), the step of, obtain stealing prediction probability M11, M12 to each feature set, M1A,
M21、M22、M2A;With classification prediction algorithm N(N is different classification prediction algorithms with M)To each feature set V11, V12, V1A,
V21, V22, V2A use step 31 respectively), step 32), step 33)The step of, obtain the stealing prediction probability to each feature set
N11、N12、N1A、N21、N22、N2A;
Step 4), by step 3)To the stealing probabilistic forecasting value of client in training set as training set input feature vector, to surveying
The stealing probabilistic forecasting value collected is tried as test set input feature vector, the tree-model predicted respectively with classification and the linear of prediction of classifying
Model is averaging prediction result to predict the stealing probability of client in test set, obtains final client's stealing prediction knot
Fruit:
Step 4.1), by step 3)The prediction probability of obtained basic model as feature, by M11, N11, M12, N12,
M1A, N1A, M21, N21, M22, N22, M2A, N2A are attached by major key of customer number, with linear classification algorithm
LogisticRegressionClassifier carries out classification prediction, obtains the predicted value to client's stealing probability in test set
LA;
Step 4.2), by step 3)The prediction probability of obtained basic model as feature, by M11, N11, M12, N12,
M1A, N1A, M21, N21, M22, N22, M2A, N2A are attached by major key of customer number, with tree classification algorithm XGBoost into
Row classification prediction, obtains the predicted value TA to client's stealing probability in test set;
Step 4.3), by step 4.1)With step 4.2)Client's stealing probabilistic forecasting value average, as most client's
Final prediction stealing probability R;
Step 5), in the customer electricity data being analysed to the final prediction stealing probability of each client respectively with it is default
Stealing probability threshold value is compared, and the client that final prediction stealing probability is more than default stealing probability threshold value is divided into stealing
The client that final prediction stealing probability is less than or equal to default stealing probability threshold value is divided into normal clients by client.
The present invention basic principle be:Different fillings is carried out to the missing values of customer electricity data to be analyzed first,
Multiple and different preprocessed data collection is generated, adds the diversity of data so that subsequent feature extraction and model can be more
The good information implied using missing data.Secondly in characteristic extraction procedure, to each preprocessed data collection, unite from time window
Construction feature collection is distinguished at meter, mutation multiple visual angles such as Data-Statistics and time series feature, and by the spy of the extraction at multiple visual angles
Sign merges into a feature set, this allows the characteristic for preferably portraying the feature of each preprocessed data collection data set,
Simultaneously because several feature sets set out to build with different view, the otherness between feature set is very big, avoids feature
Between interfere with each other, reduce the calculating process of feature selecting.Simultaneously as each preprocessed data collection, all build
The feature intersection of one feature cluster by multiple and different visual angles, therefore can preferably merge the feature at multiple and different visual angles
Collection, is conducive to final Model Fusion.In model construction process, using multiple existing mainstream sorting algorithms, including
XGBoost, Gradient Boost Decision Tree, Neural Network scheduling algorithms add the diversity of algorithm,
So that the combination of algorithms of different can preferably portray the characteristic of data from different angles.Finally, using tree-model and linearly
The average of the prediction probability of model can preferably avoid the over-fitting problem of model as final prediction result.The above method
Prediction of more accurately classifying to client's stealing probability is realized with smaller resource, there is better practical engineering application value.
Those skilled in the art of the present technique are it is understood that unless otherwise defined, all terms used herein(Including skill
Art term and scientific terminology)With the identical meaning of the general understanding with the those of ordinary skill in fields of the present invention.Also
It should be understood that those terms such as defined in the general dictionary should be understood that with in the context of the prior art
The consistent meaning of meaning, and unless defined as here, will not be explained with the meaning of idealization or overly formal.
Above-described specific embodiment has carried out the purpose of the present invention, technical solution and advantageous effect further
It is described in detail, it should be understood that the foregoing is merely the specific embodiments of the present invention, is not limited to this hair
Bright, within the spirit and principles of the invention, any modification, equivalent substitution, improvement and etc. done should be included in the present invention
Protection domain within.
Claims (6)
1. the stealing classification Forecasting Methodology of a kind of three stages various visual angles Fusion Features, which is characterized in that include the following steps:
Step 1), to customer electricity data to be analyzed, as test set, and to daily power consumption, same day ammeter reading, preceding
Missing data in ammeter reading on the one is filled respectively with " -1 " and " 0 ", forms two parts of preprocessed datas;
Step 2), to every part of preprocessed data:
Step 2.1), selected at least from time window statistics, abnormal sudden change Data-Statistics and these three visual angles of time series analysis
Feature is extracted at two visual angles, and the set of the characteristic value of each visual angle extraction is as an individual feature cluster, then extraction
To individual feature cluster merge into a feature cluster, and the feature cluster each individual feature cluster and after merging is formed
Set as the preprocessed data feature gathering conjunction;
Step 2.2), each feature cluster in being closed to feature gathering, being used using the sorting algorithm of at least one two classification should
The training set to default customer electricity data, each client in test set carry out stealing probabilistic forecasting to feature cluster respectively;
Step 3), for each client in training set and test set, it is predicted what is obtained in two parts of preprocessed datas
Each prediction stealing probability forms it and predicts stealing Making by Probability Sets;
Step 4), using the prediction stealing Making by Probability Sets of all clients in training set and test set as feature, tree classification is used respectively
Model and linear classification model predict test set, and obtain two prediction probability values are averaged, and obtain to be analyzed
The final prediction stealing probability of each client in customer electricity data;
Step 5), in the customer electricity data being analysed to the final prediction stealing probability of each client respectively with default stealing
Probability threshold value is compared, and the client that final prediction stealing probability is more than default stealing probability threshold value is divided into stealing visitor
The client that final prediction stealing probability is less than or equal to default stealing probability threshold value is divided into normal clients by family.
2. a kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features as described in claim 1, which is characterized in that
The step 2.1)It is middle select three visual angles come detailed step when extracting feature for:
Step 2.1.1), each client is counted by the power consumption of every month, and as time window feature cluster, it is described
Power consumption counts maximum, minimum value, average and the root variance for including power consumption;
Step 2.1.2), statistics daily power consumption, the numerical value catastrophe of same day ammeter reading and proxima luce (prox. luc) ammeter reading, and by its
As Characteristics of Mutation cluster, the numerical value catastrophe includes the ammeter reading less than proxima luce (prox. luc) ammeter reading, daily power consumption lacks,
The ammeter reading that same day ammeter reading missing, proxima luce (prox. luc) ammeter reading lack and daily power consumption is negative;
Step 2.1.3), to each client in chronological order, daily power consumption is converted into time series, respectively extraction time sequence
Peak value number, trough number, average, quantile, seasonal trend, periodical trend time series feature, it is special as sequential
Levy cluster;
Step 2.1.4), time window feature cluster, Characteristics of Mutation cluster and temporal aspect cluster are merged into a feature cluster;
Step 2.1.5), the set of the feature cluster formation after time window feature cluster, Characteristics of Mutation cluster, temporal aspect cluster and merging
Feature gathering as preprocessed data is closed.
3. a kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features as claimed in claim 2, which is characterized in that
The step 2.2)Detailed step be:
Each feature cluster in being closed to feature gathering, the sorting algorithm using at least one two classification are distinguished using this feature cluster
Each client in training set, test set to default customer electricity data carries out stealing probabilistic forecasting;
Step 2.2.1), the data of training set are divided into N parts of training datas by client's random sampling;
Step 2.2.2), for every part of training data:
As sub- verification collection, remaining N-1 parts of training data intersection as sub- training set, in being closed successively using feature gathering
Each feature cluster, the stealing of client in the sub- verification collection and test set predicted using the sorting algorithm of at least one two classification
Probability;
Step 2.2.3), by step 2.2.2)In the prediction results of all training datas merge, obtain each in training set
The predicted value of the stealing probability of client;
Step 2.2.4), to step 2.2.2)In the stealing probability per a client in the corresponding test set of each part training data
It averages, obtains the predicted value of the stealing probability of each client in test set.
4. a kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features as claimed in claim 3, which is characterized in that
Step 2.2)Middle use two classification sorting algorithm include XGBoost, LightGBM, Keras, Nerual Network,
Logistic Regression and Gradient Boost Decision Tree.
5. a kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features as claimed in claim 3, which is characterized in that
Step 4)Described in tree classification model for XGBoost, LightGBM, Keras, Nerual Network, Gradient
One kind in Boosting Decision Tree.
6. a kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features as claimed in claim 3, which is characterized in that
Step 4)Described in linear classification model for booster be set as gblinear XGBoost,
One kind in LogisticRegression, Linear Regression.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710036718.XA CN106909933B (en) | 2017-01-18 | 2017-01-18 | A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710036718.XA CN106909933B (en) | 2017-01-18 | 2017-01-18 | A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106909933A CN106909933A (en) | 2017-06-30 |
CN106909933B true CN106909933B (en) | 2018-05-18 |
Family
ID=59206516
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710036718.XA Expired - Fee Related CN106909933B (en) | 2017-01-18 | 2017-01-18 | A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106909933B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107492043A (en) * | 2017-09-04 | 2017-12-19 | 国网冀北电力有限公司电力科学研究院 | stealing analysis method and device |
CN107862347A (en) * | 2017-12-04 | 2018-03-30 | 国网山东省电力公司济南供电公司 | A kind of discovery method of the electricity stealing based on random forest |
CN108490288B (en) * | 2018-03-09 | 2019-04-16 | 华南师范大学 | A kind of stealing detection method and system |
CN108961215A (en) * | 2018-06-05 | 2018-12-07 | 上海大学 | Parkinson's disease assistant diagnosis system and method based on Multimodal medical image |
CN109359674A (en) * | 2018-09-27 | 2019-02-19 | 智庭(北京)智能科技有限公司 | A kind of smart lock method for detecting abnormality based on multi-model blending |
CN109858679A (en) * | 2018-12-30 | 2019-06-07 | 国网浙江省电力有限公司 | A kind of opposing electricity-stealing for the man-machine object of combination checks monitoring system and its working method |
CN110119755A (en) * | 2019-03-22 | 2019-08-13 | 国网浙江省电力有限公司信息通信分公司 | Electricity method for detecting abnormality based on Ensemble learning model |
CN111507507B (en) * | 2020-03-24 | 2023-04-18 | 重庆森鑫炬科技有限公司 | Big data-based monthly water consumption prediction method |
CN112101420A (en) * | 2020-08-17 | 2020-12-18 | 广东工业大学 | Abnormal electricity user identification method for Stacking integration algorithm under dissimilar model |
CN112232985B (en) * | 2020-10-15 | 2023-02-28 | 国网天津市电力公司 | Power distribution and utilization data monitoring method and device for ubiquitous power Internet of things |
CN112485491A (en) * | 2020-11-23 | 2021-03-12 | 国网北京市电力公司 | Power stealing identification method and device |
CN112561569B (en) * | 2020-12-07 | 2024-02-27 | 上海明略人工智能(集团)有限公司 | Dual-model-based store arrival prediction method, system, electronic equipment and storage medium |
CN113128567A (en) * | 2021-03-25 | 2021-07-16 | 云南电网有限责任公司 | Abnormal electricity consumption behavior identification method based on electricity consumption data |
CN113435513B (en) * | 2021-06-28 | 2024-06-04 | 平安科技(深圳)有限公司 | Deep learning-based insurance customer grouping method, device, equipment and medium |
CN116954591B (en) * | 2023-06-15 | 2024-02-23 | 天云融创数据科技(北京)有限公司 | Generalized linear model training method, device, equipment and medium in banking field |
CN117033916B (en) * | 2023-07-10 | 2024-07-23 | 国网四川省电力公司营销服务中心 | Power theft detection method based on neural network |
CN116933986B (en) * | 2023-09-19 | 2024-01-23 | 国网湖北省电力有限公司信息通信公司 | Electric power data safety management system based on deep learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102866321A (en) * | 2012-08-13 | 2013-01-09 | 广东电网公司电力科学研究院 | Self-adaptive stealing-leakage prevention diagnosis method |
CN103778567A (en) * | 2014-01-21 | 2014-05-07 | 深圳供电局有限公司 | Method and system for discriminating abnormal electricity utilization of user |
CN105069476A (en) * | 2015-08-10 | 2015-11-18 | 国网宁夏电力公司 | Method for identifying abnormal wind power data based on two-stage integration learning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070136082A1 (en) * | 2005-12-14 | 2007-06-14 | Southern Company Services, Inc. | System and method for energy diversion investigation management |
-
2017
- 2017-01-18 CN CN201710036718.XA patent/CN106909933B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102866321A (en) * | 2012-08-13 | 2013-01-09 | 广东电网公司电力科学研究院 | Self-adaptive stealing-leakage prevention diagnosis method |
CN103778567A (en) * | 2014-01-21 | 2014-05-07 | 深圳供电局有限公司 | Method and system for discriminating abnormal electricity utilization of user |
CN105069476A (en) * | 2015-08-10 | 2015-11-18 | 国网宁夏电力公司 | Method for identifying abnormal wind power data based on two-stage integration learning |
Non-Patent Citations (2)
Title |
---|
Anomaly detection of power Consumption based on waveform feature recognition;Tang Yijia et al;《The 11th International Conference on Computer Science&Education 》;20161231;587-591 * |
应用大数据技术的反窃电分析;陈文瑛 等;《电子测量与仪器学报》;20161031;第30卷(第10期);1558-1566 * |
Also Published As
Publication number | Publication date |
---|---|
CN106909933A (en) | 2017-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106909933B (en) | A kind of stealing classification Forecasting Methodology of three stages various visual angles Fusion Features | |
CN111738462B (en) | Fault first-aid repair active service early warning method for electric power metering device | |
CN111612650B (en) | DTW distance-based power consumer grouping method and system | |
CN110082699A (en) | A kind of low-voltage platform area intelligent electric energy meter kinematic error calculation method and its system | |
CN110232203A (en) | Knowledge distillation optimization RNN has a power failure prediction technique, storage medium and equipment in short term | |
CN110309884A (en) | Electricity consumption data anomalous identification system based on ubiquitous electric power Internet of Things net system | |
CN111582548A (en) | Power load prediction method based on multivariate user behavior portrait | |
CN112149873A (en) | Low-voltage transformer area line loss reasonable interval prediction method based on deep learning | |
CN111738331A (en) | User classification method and device, computer-readable storage medium and electronic device | |
CN104346698A (en) | Catering member big data analysis and checking system based on cloud computing and data mining | |
CN110147389A (en) | Account number treating method and apparatus, storage medium and electronic device | |
CN114611738A (en) | Load prediction method based on user electricity consumption behavior analysis | |
CN110009427B (en) | Intelligent electric power sale amount prediction method based on deep circulation neural network | |
CN112508254B (en) | Method for determining investment prediction data of transformer substation engineering project | |
CN116611589B (en) | Power failure window period prediction method, system, equipment and medium for main network power transmission and transformation equipment | |
CN114021425A (en) | Power system operation data modeling and feature selection method and device, electronic equipment and storage medium | |
CN107274025B (en) | System and method for realizing intelligent identification and management of power consumption mode | |
Wang et al. | Cloud computing and extreme learning machine for a distributed energy consumption forecasting in equipment-manufacturing enterprises | |
Sari et al. | The effectiveness of hybrid backpropagation Neural Network model and TSK Fuzzy Inference System for inflation forecasting | |
CN114676931B (en) | Electric quantity prediction system based on data center technology | |
Tee et al. | Short-term load forecasting using artificial neural networks | |
Ignatiadis et al. | Forecasting residential monthly electricity consumption using smart meter data | |
CN113642632B (en) | Power system customer classification method and device based on self-adaptive competition and equalization optimization | |
CN114638171A (en) | Power grid project investment prediction method and device, storage medium and equipment | |
CN114581263A (en) | Power grid load analysis method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210416 Address after: 210046, 66 new model street, Gulou District, Jiangsu, Nanjing Patentee after: NANJING University OF POSTS AND TELECOMMUNICATIONS Patentee after: STATE GRID ELECTRIC POWER RESEARCH INSTITUTE Co.,Ltd. Address before: 210046, 66 new model street, Gulou District, Jiangsu, Nanjing Patentee before: NANJING University OF POSTS AND TELECOMMUNICATIONS |
|
TR01 | Transfer of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180518 |
|
CF01 | Termination of patent right due to non-payment of annual fee |