CN110210686A - A kind of electricity charge risk model construction method of electric power big data - Google Patents
A kind of electricity charge risk model construction method of electric power big data Download PDFInfo
- Publication number
- CN110210686A CN110210686A CN201910509737.9A CN201910509737A CN110210686A CN 110210686 A CN110210686 A CN 110210686A CN 201910509737 A CN201910509737 A CN 201910509737A CN 110210686 A CN110210686 A CN 110210686A
- Authority
- CN
- China
- Prior art keywords
- user
- arrearage
- characteristic
- risk
- electricity charge
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000005611 electricity Effects 0.000 title claims abstract description 123
- 238000010276 construction Methods 0.000 title claims abstract description 13
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 47
- 238000007477 logistic regression Methods 0.000 claims abstract description 18
- 238000012795 verification Methods 0.000 claims abstract description 4
- 238000000034 method Methods 0.000 claims description 31
- 238000012549 training Methods 0.000 claims description 16
- 238000004458 analytical method Methods 0.000 claims description 15
- 238000010219 correlation analysis Methods 0.000 claims description 10
- 238000002474 experimental method Methods 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 238000007637 random forest analysis Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000006467 substitution reaction Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 235000013399 edible fruits Nutrition 0.000 claims description 2
- 238000012545 processing Methods 0.000 claims description 2
- 239000013589 supplement Substances 0.000 claims description 2
- 230000002265 prevention Effects 0.000 abstract description 3
- 238000012216 screening Methods 0.000 abstract description 2
- 238000011084 recovery Methods 0.000 description 13
- 230000006399 behavior Effects 0.000 description 12
- 238000012360 testing method Methods 0.000 description 10
- 230000000875 corresponding effect Effects 0.000 description 7
- 238000012502 risk assessment Methods 0.000 description 3
- 238000003672 processing method Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011017 operating method Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention proposes a kind of electricity charge risk model construction methods of electric power big data, first, the history electricity consumption data of current power user is obtained from the marketing system inside state's net, then, according to the recall rate of history feature data, rate of precision and F1 metric simultaneously construct electricity charge risk model using logistic regression algorithm, the risk class of electricity charge risk model prediction electricity consumption user is utilized according to the threshold value I of setting and threshold value II, electricity charge risk model is verified according to the output result of electricity charge risk model and user situation of really paying the fees, and optimize the parameter of logistic regression algorithm according to verification result.The present invention constructs electricity charge risk model according to the historical information of electricity consumption user, using following one month of electricity charge risk model prediction electricity consumption user whether arrearage, and the height of the owing electricity charges risk of automatic screening electricity consumption user, marketing personnel take timely prevention and control measure to risk subscribers according to prediction result, a large amount of human and material resources resource is saved, working efficiency is effectively improved.
Description
Technical field
The present invention relates to a kind of electricity charge risk model building process, the electricity charge risk model structure of especially a kind of electric power big data
Construction method.
Background technique
Tariff recovery work is a great system engineering, and result and power supply enterprise's management performance breath of tariff recovery cease
Correlation, all the time, tariff recovery are all the key contents of power marketing.Power supply enterprise possesses very huge customer quantity,
But the standing degree of each user is there is very big difference, what the standing degree of user drastically influenced its electricity charge pays shape
Condition.Part Electricity customers are in order to pursue economic interests at the moment, and there are the deliberate arrears electricity charge, the feelings of occupancy power supply enterprise's fund
Condition seriously affects the development of electric power enterprise electricity charge recovery of the capital and its next step.
The risk assessment of tariff recovery is mainly assessed by manually at this stage.Experienced tariff recovery personnel understand root
It according to the history payment situation of electricity consumption user, is analyzed using some statistical tools, is in arrears with to judge that the user whether there is
A possibility that electricity charge.This method requires tariff recovery personnel to have certain background experience, while to be familiar with different user
Details, the risk situation for judging user's owing electricity charges for needing the accumulation ability of a long period relatively accurate.Thus,
It is difficult to promote on a large scale.With the increase of electricity consumption user and the increasingly complexity of user power utilization environment, traditional method is
Current situation can not be coped with, needs a kind of objective, automation tariff recovery methods of risk assessment, this method can be according to electricity consumption
The risk situation of its owing electricity charges of the historical information automatic Evaluation of user, and the user that risk is more than certain threshold value is carried out pre-
It is alert.
Summary of the invention
For the difference of tariff recovery personnel's background experience existing for traditional tariff recovery method, cause to tariff recovery risk
The technical issues of otherness of estimation, the invention proposes a kind of electricity charge risk model construction methods of electric power big data, provide
A kind of objective, automation tariff recovery methods of risk assessment, can give warning in advance to marketing personnel and carry out prevention and control measure, thus
Make tariff recovery everything goes well with your work to carry out.
The technical scheme of the present invention is realized as follows:
A kind of electricity charge risk model construction method of electric power big data, comprising:
S1, the history feature data that current power user is obtained from the marketing system inside state's net;
S2, electricity is constructed according to the recall rate of history feature data, rate of precision and F1 metric and using logistic regression algorithm
Take risk model;
S3, the risk class of electricity charge risk model prediction electricity consumption user is utilized according to the threshold value I and threshold value II of setting;
S4, situation of really being paid the fees according to the output result of electricity charge risk model and user test electricity charge risk model
Card, and according to the parameter of verification result optimization logistic regression algorithm.
Preferably, the method for the history feature data acquisition of the electricity consumption user are as follows:
S11, characteristic pretreatment:
(1) characteristic clean: from state net inside marketing system obtain characteristic after, to the exception in characteristic
Value and missing values are handled, and exceptional value is replaced processing with numerical value " 1 ", and missing values carry out supplement process with numerical value " 0 ";
(2) characteristic format is converted: text-type characteristic present in characteristic is converted into numeric type feature
Data, and classification coding is carried out to classification type characteristic;
S12, history arrearage behavior correlation analysis:
(1) history arrearage behavior correlation analysis: collecting the history feature data of user, analyze the arrearage situation of user with
The correlation of history payment;
(2) feature correlation is analyzed: the correlation between analysis different characteristic, searches whether arrearage correlation is high with user
Feature;
S13, characteristic expand:
Information using electricity consumption behavior in electricity consumption user nearest 6 months expands characteristic, user power utilization behavior
Information include characteristic maximum value, minimum value, mean value, variance and standard deviation;
S14, characteristic importance analysis:
After characteristic expands, characteristic is standardized, and calculates each spy using gini index
Levy the characteristic importance of data.
Preferably, the building of the electricity charge risk model includes the selection of risk subscribers, the selection of characteristic, training set
Selection and four part of algorithms selection implement process are as follows:
S21, risk subscribers select: selecting 2 years data as observed object, the electricity consumption gone to undue expense will not be owed in data
User deletes, using the electricity consumption user for having arrearage to record as risk subscribers;
The selection of S22, characteristic: according to the characteristic importance information in step S14, successively characteristic importance is deleted most
Low feature is tested, and using the recall rate of feature, rate of precision and F1 metric as measurement standard, and experimental result is best
One group of feature be saved in characteristic set;
S23, training set selection: successively the electricity consumption data in the different cycles of risk of selection user is tested, and selection is real
One group of best electricity consumption data of result is tested as training set;
S24, algorithms selection: from algorithm precision of prediction and Riming time of algorithm respectively to time series algorithm, neural network
Algorithm, SVM algorithm, random forests algorithm and logistic regression algorithm are compared, and logistic regression algorithm is selected to remove building electricity charge wind
Dangerous model.
Preferably, the method for the risk class that electricity consumption user is predicted according to electricity charge risk model are as follows:
S31, risk class divide: the probability of each arrearage risk subscribers next month is predicted according to electricity charge risk model, if
It sets different threshold value I and risk subscribers is divided into high risk user, risk user and low-risk user;
S32, for the user of same risk class, a threshold value II is set, user is divided into defaulting subscriber and not arrearage
User;User of the arrearage probability more than or equal to threshold value II judge the user prediction month will arrearage, arrearage probability is small
Judge that the user will not arrearage in prediction month in the user of threshold value II.
Preferably, the recall rate recall in the step S22 is to predict that correct defaulting subscriber accounts for true defaulting subscriber
Ratio:
The rate of precision precision is the ratio that the correct arrearage number of prediction accounts for all users for being predicted as arrearage:
The metric F1 is the harmonic-mean of recall rate recall and rate of precision precision:
Wherein, TP is true arrearage and prediction result is also the number of arrearage, and TN is true not arrearage and prediction result
For the number of not arrearage, FN is true arrearage but prediction result is the number of not arrearage, and FP is true not arrearage but predicts
It as a result is the number of arrearage.
Preferably, the implementation method of the logistic regression algorithm in the step S24 are as follows:
Assuming that sample is { X, y }, the value of y is 0 or 1, and y=0 indicates " electricity charge have been paid in not arrearage, i.e. user in time ", y
=1 indicates " arrearage, i.e. user fail to pay the electricity charge in time ", and X is that n ties up sampling feature vectors, and X includes x1,x2,…,xn, it is assumed that
Sample X belongs to negative class, then the probability of arrearage are as follows:
Wherein, g (x)=θ0+θ1x1+θ2x2+...θnxn, θ is regression coefficient, and θ includes θ0,θ1,…,θn;
The value of regression coefficient θ are as follows:
S24-1, the value for initializing regression coefficient θ;
S24-2, regression coefficient θ substitution formula (1) is obtained into the output valve of electricity charge risk model
S24-3, the output valve that electricity charge risk model is calculated according to formula (2)With the error between data actual value y:
S24-4, according to formula (3) to regression coefficient θiIt is updated:
Wherein, i=1,2,3 ..., n, α are constant coefficient;
S24-5, given threshold t, judgementIt is whether true, step S24-6 is executed if set up, is otherwise returned
Return step S24-2;
S24-6, output regression coefficient θiValue, according to regression coefficient θiDetermine Logic Regression Models.
It is that the present invention generates the utility model has the advantages that the present invention constructs electricity charge risk model according to the historical information of electricity consumption user,
Using following one month of electricity charge risk model prediction electricity consumption user whether arrearage, and the owing electricity charges wind of automatic screening electricity consumption user
The height of danger, marketing personnel take timely prevention and control measure to risk subscribers according to prediction result, save a large amount of manpower, object
Power resource, effectively improves working efficiency.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is flow chart of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other under that premise of not paying creative labor
Embodiment shall fall within the protection scope of the present invention.
As shown in Figure 1, a kind of electricity charge risk model construction method of electric power big data, the specific steps are as follows:
S1, the history feature data that current power user is obtained from the marketing system inside state's net.
The method of the history feature data for obtaining electricity consumption user are as follows:
S11, characteristic pretreatment:
(1) characteristic is cleaned:
From state net inside marketing system obtain characteristic after, characteristic is cleaned;First, with computer journey
Sequence exports the maximum value and minimum value of each characteristic, whether in the normal range to analyze each characteristic, and judge
The exceptional value and missing values of characteristic.Exceptional value is characterized characteristic of the data far beyond normal range (NR), and missing values are
Characteristic is sky.The method that the exceptional value of characteristic is handled are as follows: replaced exceptional value according to business meaning.Such as
In " jfjsl (payment promptness rate) " this characteristic, when the value of characteristic is " 999999 ", business need is not met,
It is replaced at this time with numerical value " 1 ".Characteristic " yszb (advance offset accounting) " and " ysnumzb (advance number accounting of carrying down) "
Processing method is identical with the processing method of " jfjsl (payment promptness rate) ".The method that the missing values of characteristic are handled
Are as follows: it is filled with numerical value " 0 ".
(2) Data Format Transform: by text-type data conversion present in data at numeric type data, and to classification type number
According to progress classification coding.
S12, correlation analysis:
Correlation analysis mainly includes that history arrearage behavior correlation analysis and feature correlation analyze two parts.
(1) history arrearage behavior correlation analysis: collecting the historical data of electricity consumption user and analyzes the arrearage of electricity consumption user
The correlation that situation is paid the fees with history.Analysis method is: being directed to same electricity consumption user, calculates the Pearson came system between each month
Number.The analysis found that: electricity consumption user's this month whether can generate arrearage behavior and its recent months electricity consumption behavior be have it is related
Property, and with the increase of time interval, Pearson's coefficient becomes smaller, and the rule gradually successively decreased is presented in correlation.Pearson's coefficient
It is for measuring linearly related strong and weak degree between two variables, the value range of Pearson's coefficient is [- 1,1], works as Pierre
Two variables are positive correlation when inferior coefficient is positive number, and when Pearson's coefficient is negative, two variables are negatively correlated, Pearson came system
The more big then relevant degree of several absolute values is bigger.
Wherein, n is number of samples, and X, Y are variable, Xi, YiRespectively variable X, the corresponding value of Y,Respectively
Variable X, the corresponding average value of Y.
(2) feature correlation analyze: analysis different characteristic data between correlation, search with electricity consumption user whether arrearage
The high feature of correlation.34 foundation characteristics are shared from the initial data taken out in Electric Power Marketing System, specific such as table 1:
1 foundation characteristic of table
Feature correlation analysis is carried out to 34 foundation characteristics, it is each according to the Pearson's coefficient analysis between different characteristic
Correlation between feature, search with user whether the high feature of arrearage correlation.
S13, feature expand:
By a large amount of data statistic analysis and a series of experiment, can verify 34 features can not to user whether
There may be the behaviors of arrearage accurately to be described.Therefore, on the basis of 34 foundation characteristics, one is carried out to foundation characteristic
The expansion of series.
According to the history arrearage behavior correlation analysis in step S12, it is known that user's this month pay the fees situation with its first 6 months
It is added in of that month data with different degrees of correlation, therefore by preceding 6 months features of user data, it is (advance with yszb
Offset accounting) expansion for, concrete methods of realizing is as shown in table 2:
Table 2 is expanded according to history arrearage behavior correlation
Then 14 features in 24 features are further expanded again, maximum value including characteristic, most
Small value, mean value, variance and standard deviation;By taking the expansion of yszb (advance offset accounting) as an example, concrete methods of realizing is as shown in table 3:
Table 3 is expanded according to feature correlation
14 features are respectively as follows: " yszb ", " jfjsl ", " cashchk_day_num ", " hksc ", " t_pq ", " t_pq_
Sum ", " release_day ", " charge_day ", " end_day ", " end_day2 ", " remain_daynum " " remain_
Daynum2 ", " all_daynum ", " all_daynum2 ";These features are expanded according to 3 the method for table.
Expand by feature, one is obtained 236 characteristics.For each electricity consumption user, there is every month one to be
The label of no arrearage, i.e. " sfyq " in addition increase a feature " sfyq_sum (sum of preceding 6 months arrearage numbers) " again, because
Final one 238 features are obtained in this.
S14, characteristic importance analysis:
After feature expands, firstly, checking the data after feature expansion with the presence or absence of exceptional value or missing values;If having different
Constant value or missing values are first handled exceptional value and missing values according to the method in step S11.Secondly as each data
Value range it is different, need to data standardization, data bi-directional scaling be allowed to fall into a specific sections,
Data are uniformly mapped on [0,1] section.Finally, carrying out characteristic importance analysis again.Discovery has 6 during the experiment
Whether missing values column, are no different constant value, it is therefore desirable to delete missing values column, utilize the method based on gini index in addition to owing later
Take (' sfyq ') and 230 features except the date (' rcvbl_ym ') carry out characteristic importance analysis, calculates separately each spy
The different degree of sign, the characteristic set that then preference pattern building needs are prepared for modeling in next step.
S2, electricity is constructed according to the recall rate of history feature data, rate of precision and F1 metric and using logistic regression algorithm
Take risk model;
The building of the electricity charge risk model includes selection, feature selecting, training set selection and the algorithm choosing of risk subscribers
Four parts are selected, process is implemented are as follows:
S21, risk subscribers select: selecting 2 years data as observed object, carry out preliminary analysis to data, will count
The electricity consumption user to go to undue expense deletion is not owed in, using the electricity consumption user for thering is arrearage to record as risk subscribers.Analysis is found: sample
In have 93% electricity consumption user have no precedent arrearage record, only 7% electricity consumption user had arrearage record, and every month owe
The user of expense only accounts for 0.6% or so, therefore this is the unbalanced data set of distributed pole.In order to improve the accuracy of prediction,
The electricity consumption user that no arrearage records is considered as devoid of risk user, deletes devoid of risk user, 7% user for having arrearage to record is regarded
For risk subscribers.
S22, feature selecting: according to the characteristic importance information in step S14, the minimum spy of characteristic importance is successively deleted
Sign is tested, i.e., the minimum a part of feature of different degree is deleted in experiment every time, deletes 30,50,80 and 100 respectively
Feature, and using the recall rate of feature, rate of precision and F1 metric as measurement index, one group of best feature of experimental result is put
Enter the characteristic set of model training.It is found by many experiments, optimal characteristic set is 232 after expanding by feature
Feature (including " sfyq " and " rcvbl_ym "), does not need to be deleted again.
The recall rate recall is the ratio that the correct defaulting subscriber of prediction accounts for true defaulting subscriber:
The rate of precision precision is the ratio that the correct arrearage number of prediction accounts for all users for being predicted as arrearage:
The metric F1 is the harmonic-mean of recall rate recall and rate of precision precision:
Wherein, TP is true arrearage and prediction result is also the number of arrearage, and TN is true not arrearage and prediction result
For the number of not arrearage, FN is true arrearage but prediction result is the number of not arrearage, and FP is true not arrearage but predicts
It as a result is the number of arrearage.
S23, training set select: successively the electricity consumption data in the different cycles of risk of selection user is selected as testing
One group of best electricity consumption data of experimental result is selected as training set;Successively selection user 3 months, 6 months, 12 months electricity consumption numbers
It is tested according to as training set, is found by experiment: when selecting user 12 months electricity consumption datas as training set, experiment knot
Fruit is best.
S24, algorithms selection: from algorithm precision of prediction, Riming time of algorithm and risk class division etc. difference clock synchronization
Between sequence algorithm, neural network algorithm, SVM algorithm, random forests algorithm and logistic regression algorithm be compared, determination it is last
Electricity charge risk model.Detailed process is as follows:
Step1: characteristic is divided into two groups, respectively training set and test set;
Step2: one of algorithm is selected, training is done to training set data, obtains trained model, then model is answered
It uses on test set and predicts whether user can arrearage.
Step3: the true payment situation of user on test set and prediction result being compared, recall rate recall is obtained, essence
Quasi- rate precision and F1 metric.
Step4: adjusting the parameter of above-mentioned electricity charge risk model, by many experiments, available multiple groups recall rate
Recall, rate of precision precision and F1 metric, selected from multiple groups experimental result one group of optimal recall rate recall,
Rate of precision precision and F1 metric is saved in the corresponding memory space of the algorithm.
Step5: selecting a kind of algorithm again, repeats step2~step4 and constructs model, then obtains one group of recall rate
Recall, rate of precision precision and F1 metric are saved in its corresponding algorithm space, and successively selection algorithm is until building five
A model, and obtain five groups of recall rate recall, rate of precision precision and F1 metric.
To this five groups of recall rate recall, rate of precision precision and F1 metric is compared and analyzed, by repeatedly real
It tests, it is final to determine to construct electricity charge risk model using logistic regression algorithm.
By test of many times, tested using logistic regression algorithm to 231 features in addition to the date are retained, it is last true
When selecting user 12 months data surely as training set, the test result of electricity charge risk model is best.
The implementation method of the logistic regression algorithm are as follows:
Assuming that sample is { X, y }, the value of y is 0 or 1, and y=0 indicates " electricity charge have been paid in not arrearage, i.e. user in time ", y
=1 indicates " arrearage, i.e. user fail to pay the electricity charge in time ", and X is that n ties up sampling feature vectors, and X includes x1,x2,…,xn, it is assumed that
Sample X belongs to negative class, then the probability of arrearage are as follows:
Wherein, g (x)=θ0+θ1x1+θ2x2+...θnxn, θ is regression coefficient, and θ includes θ0,θ1,…,θn;
The value of regression coefficient θ are as follows:
S24-1, the value for initializing regression coefficient θ;
S24-2, regression coefficient θ substitution formula (1) is obtained into the output valve of electricity charge risk model
S24-3, according to the output valve of formula (2) computation modelWith the error between data actual value y:
S24-4, according to formula (3) to regression coefficient θiIt is updated:
Wherein, i=1,2,3 ..., n, α are constant coefficient, α=0.01;
S24-5, given threshold t=0.0001, judgementIt is whether true, step S24- is executed if set up
6, it is no to then follow the steps S24-2;
S24-6, output regression coefficient θiValue, according to regression coefficient θiDetermine Logic Regression Models.
By the modeling analysis to risk subscribers, determining regression coefficient and its corresponding feature name are as follows:
S3, the risk class of electricity charge risk model prediction electricity consumption user is utilized according to the threshold value I and threshold value II of setting;
The method of the risk class that electricity consumption user is predicted according to electricity charge risk model are as follows:
S31, risk class divide: according to the output of electricity charge risk model the result is that arrearage behavior occurs for each risk subscribers
Probability, different threshold value I is set by risk subscribers and is divided into high risk user, risk user and low-risk user.
It is defined as follows:
Arrearage probability >=0.7 is high risk user;
0.7 > arrearage probability >=0.4 is risk user;
0.4 > arrearage probability >=0 is low-risk user.
S32, for the user of same risk class, a threshold value II is set, user is divided into defaulting subscriber and not arrearage
User;User of the arrearage probability more than or equal to threshold value II judge the user prediction month will arrearage, arrearage probability is small
Judge that the user will not arrearage in prediction month in the user of threshold value II.Such as: the risk class in step S31 divides
In, the electricity consumption user of arrearage probability >=0.7 is defined as high risk user, electricity consumption user has only been done a wind by this process
The judgement of dangerous grade, do not predict electricity consumption user whether arrearage, it is therefore desirable to suitable threshold value is set by the use of the grade
Family is divided into two classes of arrearage and not arrearage as final prediction result.
The threshold value II for determining whether to be arranged when arrearage is as follows:
High risk user: 0.7 (i.e. arrearage probability > 0.7 when, determine high risk user next month will arrearage, otherwise do not owe
Take);
Risk user: 0.4267 (i.e. arrearage probability > 0.4267 when, determine risk user next month will arrearage, it is no
Then not arrearage);
Low-risk user: 0.25 (i.e. arrearage probability > 0.25 when, determine low-risk user next month will arrearage, otherwise not
Arrearage).
By taking high risk user as an example, the selection method of threshold value II are as follows:
Step1: threshold value II recycles repeatedly between 0.7~1, and step-length 0.01, circulation all calculates a high risk every time
Recall rate recall, rate of precision precision and the F1 metric of user selects best primary of effect, high wind can be obtained
The threshold value II of dangerous user.For example threshold value II=0.7 is taken when circulation for the first time, by arrearage probability > 0.7 in high risk user
User is considered as arrearage, and the user of arrearage probability≤0.7 in high risk user is considered as not arrearage, calculates high risk user at this time
Recall rate recall, rate of precision precision and F1 metric.Successively take threshold value II=0.71 later, 0.72,
0.73 ..., 1, repeatedly after circulation, available multiple groups recall rate recall, rate of precision precision and F1 metric, selection
As a result the corresponding threshold value II of best one cycle.
Step2: doing and test three times, i.e., predicts three months respectively, all available judgement of the high risk user of every month
Threshold value;
Step3: averaging is carried out to three decision thresholds of high risk user, average decision threshold is calculated, as institute
The corresponding threshold value II of the high risk user asked, that is, 0.7.
The decision threshold of risk user and low-risk user can be used step step1~step3 operating method and acquires.
S4, situation of really being paid the fees according to the output result of electricity charge risk model and user test electricity charge risk model
Card, and the regular terms and penalty factor that optimize logistic regression algorithm according to verification result are to improve recall rate recall, rate of precision
Precision and F1 metric can determine an optimal electricity charge risk model, then utilize optimal electricity charge risk model
Carrying out prediction to the arrearage situation in electricity consumption user multiple months proves the stability of model.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (6)
1. a kind of electricity charge risk model construction method of electric power big data characterized by comprising
S1, the history feature data that current power user is obtained from the marketing system inside state's net;
S2, electricity charge wind is constructed according to the recall rate of history feature data, rate of precision and F1 metric and using logistic regression algorithm
Dangerous model;
S3, the risk class of electricity charge risk model prediction electricity consumption user is utilized according to the threshold value I and threshold value II of setting;
S4, situation of really being paid the fees according to the output result of electricity charge risk model and user verify electricity charge risk model, and
Optimize the parameter of logistic regression algorithm according to verification result.
2. the electricity charge risk model construction method of electric power big data according to claim 1, which is characterized in that the electricity consumption
The method of the history feature data acquisition of user are as follows:
S11, characteristic pretreatment:
(1) characteristic is cleaned: after the marketing system internal from state's net obtains characteristic, in characteristic exceptional value with
Missing values are handled, and exceptional value is replaced processing with numerical value " 1 ", and missing values carry out supplement process with numerical value " 0 ";
(2) characteristic format is converted: text-type characteristic present in characteristic is converted into numeric type characteristic,
And classification coding is carried out to classification type characteristic;
S12, history arrearage behavior correlation analysis:
(1) history arrearage behavior correlation analysis: collecting the history feature data of user, analyzes the arrearage situation and history of user
The correlation of payment;
(2) feature correlation analyze: analysis different characteristic between correlation, search with user whether the high spy of arrearage correlation
Sign;
S13, characteristic expand:
Information using electricity consumption behavior in electricity consumption user nearest 6 months expands characteristic, the letter of user power utilization behavior
Breath includes maximum value, minimum value, mean value, variance and the standard deviation of characteristic;
S14, characteristic importance analysis:
After characteristic expands, characteristic is standardized, and calculates each characteristic using gini index
According to characteristic importance.
3. the electricity charge risk model construction method of electric power big data according to claim 1 or 2, which is characterized in that described
The building of electricity charge risk model includes the selection of risk subscribers, the selection of characteristic, training set selection and algorithms selection four
Point, implement process are as follows:
S21, risk subscribers select: selecting 2 years data as observed object, the electricity consumption user to go to undue expense will not be owed in data
It deletes, using the electricity consumption user for thering is arrearage to record as risk subscribers;
The selection of S22, characteristic: according to the characteristic importance information in step S14, it is minimum successively to delete characteristic importance
Feature is tested, and using the recall rate of feature, rate of precision and F1 metric as measurement standard, by experimental result it is best one
Group feature is saved in characteristic set;
S23, training set selection: successively the electricity consumption data in the different cycles of risk of selection user is tested, choice experiment knot
One group of best electricity consumption data of fruit is as training set;
S24, algorithms selection: from algorithm precision of prediction and Riming time of algorithm respectively to time series algorithm, neural network algorithm,
SVM algorithm, random forests algorithm and logistic regression algorithm are compared, and logistic regression algorithm is selected to remove building electricity charge risk mould
Type.
4. the electricity charge risk model construction method of electric power big data according to claim 1, which is characterized in that the basis
The method that electricity charge risk model predicts the risk class of electricity consumption user are as follows:
S31, risk class divide: the probability of each arrearage risk subscribers next month are predicted according to electricity charge risk model, setting is not
Risk subscribers are divided into high risk user, risk user and low-risk user by same threshold value I;
S32, for the user of same risk class, a threshold value II is set, user is divided into defaulting subscriber and not defaulting subscriber;
Arrearage probability more than or equal to threshold value II user i.e. judge the user prediction month will arrearage, arrearage probability be less than threshold value
The user of II judges that the user will not arrearage in prediction month.
5. the electricity charge risk model construction method of electric power big data according to claim 3, which is characterized in that the step
Recall rate recall in S22 is the ratio that the correct defaulting subscriber of prediction accounts for true defaulting subscriber:
The rate of precision precision is the ratio that the correct arrearage number of prediction accounts for all users for being predicted as arrearage:
The metric F1 is the harmonic-mean of recall rate recall and rate of precision precision:
Wherein, TP is true arrearage and prediction result is also the number of arrearage, and TN is true not arrearage and prediction result is also not
The number of arrearage, FN is true arrearage but prediction result is the number of not arrearage, and FP is true not arrearage but prediction result
For the number of arrearage.
6. the electricity charge risk model construction method of electric power big data according to claim 5, which is characterized in that the step
The implementation method of logistic regression algorithm in S24 are as follows:
Assuming that sample is { X, y }, the value of y is 0 or 1, and y=0 indicates " electricity charge have been paid in not arrearage, i.e. user in time ", y=1
It indicates " arrearage, i.e. user fail to pay the electricity charge in time ", X is that n ties up sampling feature vectors, and X includes x1,x2,…,xn, it is assumed that sample
X belongs to negative class, then the probability of arrearage are as follows:
Wherein, g (x)=θ0+θ1x1+θ2x2+...θnxn, θ is regression coefficient, and θ includes θ0,θ1,…,θn;
The value of regression coefficient θ are as follows:
S24-1, the value for initializing regression coefficient θ;
S24-2, regression coefficient θ substitution formula (1) is obtained into the output valve of electricity charge risk model
S24-3, the output valve that electricity charge risk model is calculated according to formula (2)With the error between data actual value y:
S24-4, according to formula (3) to regression coefficient θiIt is updated:
Wherein, i=1,2,3 ..., n, α are constant coefficient;
S24-5, given threshold t, judgementIt is whether true, step S24-6 is executed if set up, otherwise returns to step
Rapid S24-2;
S24-6, output regression coefficient θiValue, according to regression coefficient θiDetermine Logic Regression Models.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910509737.9A CN110210686A (en) | 2019-06-13 | 2019-06-13 | A kind of electricity charge risk model construction method of electric power big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910509737.9A CN110210686A (en) | 2019-06-13 | 2019-06-13 | A kind of electricity charge risk model construction method of electric power big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110210686A true CN110210686A (en) | 2019-09-06 |
Family
ID=67792298
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910509737.9A Pending CN110210686A (en) | 2019-06-13 | 2019-06-13 | A kind of electricity charge risk model construction method of electric power big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110210686A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111198907A (en) * | 2019-12-24 | 2020-05-26 | 深圳供电局有限公司 | Method and device for identifying potential defaulting user, computer equipment and storage medium |
CN111310785A (en) * | 2020-01-15 | 2020-06-19 | 杭州华网信息技术有限公司 | National power grid mechanical external damage prediction method |
CN111340375A (en) * | 2020-02-28 | 2020-06-26 | 创新奇智(上海)科技有限公司 | Electricity charge recycling risk prediction method and device, electronic equipment and storage medium |
CN111639883A (en) * | 2020-06-15 | 2020-09-08 | 江苏电力信息技术有限公司 | Electricity charge recycling risk prediction method based on machine learning |
CN112036682A (en) * | 2020-07-10 | 2020-12-04 | 广西电网有限责任公司 | Early warning method and device for frequent power failure |
CN112614342A (en) * | 2020-12-10 | 2021-04-06 | 大唐高鸿数据网络技术股份有限公司 | Early warning method for road abnormal event, vehicle-mounted equipment and road side equipment |
CN112801709A (en) * | 2021-02-05 | 2021-05-14 | 杭州拼便宜网络科技有限公司 | User loss prediction method, device, equipment and storage medium |
CN112948367A (en) * | 2021-03-24 | 2021-06-11 | 国网浙江省电力有限公司物资分公司 | Data cleaning system for power material configuration demand measurement and calculation |
CN113256008A (en) * | 2021-05-31 | 2021-08-13 | 国家电网有限公司大数据中心 | Arrearage risk level determination method, device, equipment and storage medium |
CN113554454A (en) * | 2021-06-30 | 2021-10-26 | 西安图迹信息科技有限公司 | Big data is sold system with electric power and equipment thereof |
CN113592140A (en) * | 2021-06-22 | 2021-11-02 | 国网宁夏电力有限公司吴忠供电公司 | Electric charge payment prediction model training system and electric charge payment prediction model |
CN114374618A (en) * | 2021-12-24 | 2022-04-19 | 中国电信股份有限公司 | Training method, user arrearage off-network prediction method and device |
CN115099478A (en) * | 2022-06-17 | 2022-09-23 | 国网数字科技控股有限公司 | User electricity consumption behavior prediction method and device, electronic equipment and storage medium |
CN115905319A (en) * | 2022-11-16 | 2023-04-04 | 国网山东省电力公司营销服务中心(计量中心) | Automatic identification method and system for abnormal electricity charges of massive users |
CN117151768A (en) * | 2023-10-30 | 2023-12-01 | 国网浙江省电力有限公司营销服务中心 | Construction method and system of wind control rule base of generated marketing event |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030023466A1 (en) * | 2001-07-27 | 2003-01-30 | Harper Charles N. | Decision support system and method |
CN106251049A (en) * | 2016-07-25 | 2016-12-21 | 国网浙江省电力公司宁波供电公司 | A kind of electricity charge risk model construction method of big data |
-
2019
- 2019-06-13 CN CN201910509737.9A patent/CN110210686A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030023466A1 (en) * | 2001-07-27 | 2003-01-30 | Harper Charles N. | Decision support system and method |
CN106251049A (en) * | 2016-07-25 | 2016-12-21 | 国网浙江省电力公司宁波供电公司 | A kind of electricity charge risk model construction method of big data |
Non-Patent Citations (3)
Title |
---|
厉建宾等: ""基于大数据分析的客户电费风险预测及防控"", 《电力大数据》 * |
李晓蕾等: ""基于改进随机森林的电力用户欠费风险分析预警"", 《电测与仪表》 * |
赵少东等: ""基于熵值法的电力客户敏感度综合评价模型研究"", 《电工技术》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111198907A (en) * | 2019-12-24 | 2020-05-26 | 深圳供电局有限公司 | Method and device for identifying potential defaulting user, computer equipment and storage medium |
CN111310785A (en) * | 2020-01-15 | 2020-06-19 | 杭州华网信息技术有限公司 | National power grid mechanical external damage prediction method |
CN111340375A (en) * | 2020-02-28 | 2020-06-26 | 创新奇智(上海)科技有限公司 | Electricity charge recycling risk prediction method and device, electronic equipment and storage medium |
CN111639883A (en) * | 2020-06-15 | 2020-09-08 | 江苏电力信息技术有限公司 | Electricity charge recycling risk prediction method based on machine learning |
CN112036682A (en) * | 2020-07-10 | 2020-12-04 | 广西电网有限责任公司 | Early warning method and device for frequent power failure |
CN112614342A (en) * | 2020-12-10 | 2021-04-06 | 大唐高鸿数据网络技术股份有限公司 | Early warning method for road abnormal event, vehicle-mounted equipment and road side equipment |
CN112801709A (en) * | 2021-02-05 | 2021-05-14 | 杭州拼便宜网络科技有限公司 | User loss prediction method, device, equipment and storage medium |
CN112948367A (en) * | 2021-03-24 | 2021-06-11 | 国网浙江省电力有限公司物资分公司 | Data cleaning system for power material configuration demand measurement and calculation |
CN113256008A (en) * | 2021-05-31 | 2021-08-13 | 国家电网有限公司大数据中心 | Arrearage risk level determination method, device, equipment and storage medium |
CN113592140A (en) * | 2021-06-22 | 2021-11-02 | 国网宁夏电力有限公司吴忠供电公司 | Electric charge payment prediction model training system and electric charge payment prediction model |
CN113554454A (en) * | 2021-06-30 | 2021-10-26 | 西安图迹信息科技有限公司 | Big data is sold system with electric power and equipment thereof |
CN114374618A (en) * | 2021-12-24 | 2022-04-19 | 中国电信股份有限公司 | Training method, user arrearage off-network prediction method and device |
CN115099478A (en) * | 2022-06-17 | 2022-09-23 | 国网数字科技控股有限公司 | User electricity consumption behavior prediction method and device, electronic equipment and storage medium |
CN115905319A (en) * | 2022-11-16 | 2023-04-04 | 国网山东省电力公司营销服务中心(计量中心) | Automatic identification method and system for abnormal electricity charges of massive users |
CN115905319B (en) * | 2022-11-16 | 2024-04-19 | 国网山东省电力公司营销服务中心(计量中心) | Automatic identification method and system for abnormal electricity fees of massive users |
CN117151768A (en) * | 2023-10-30 | 2023-12-01 | 国网浙江省电力有限公司营销服务中心 | Construction method and system of wind control rule base of generated marketing event |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110210686A (en) | A kind of electricity charge risk model construction method of electric power big data | |
CN111104981B (en) | Hydrological prediction precision evaluation method and system based on machine learning | |
CN112686464A (en) | Short-term wind power prediction method and device | |
CN108051035B (en) | The pipe network model recognition methods of neural network model based on gating cycle unit | |
CN112069567A (en) | Method for predicting compressive strength of concrete based on random forest and intelligent algorithm | |
JP2023529760A (en) | Early Warning Analysis Method of Expiration of Reservoir Scheduling Rules under the Impact of Climate Change | |
CN104572449A (en) | Automatic test method based on case library | |
CN105701596A (en) | Method for lean distribution network emergency maintenance and management system based on big data technology | |
CN103488869A (en) | Wind power generation short-term load forecast method of least squares support vector machine | |
CN107426759A (en) | The Forecasting Methodology and system of newly-increased base station data portfolio | |
CN110059845B (en) | Metering device clock error trend prediction method based on time sequence evolution gene model | |
CN103530527A (en) | Wind power probability forecasting method based on numerical weather forecasting ensemble forecasting results | |
CN109190907A (en) | The small micro- power honesty risk index construction method of power supply station based on big data | |
CN114290960A (en) | Method and device for acquiring battery health degree of power battery and vehicle | |
CN105406461A (en) | Adaptive dynamic load monitoring method for power distribution network power failure events | |
CN112836920A (en) | Coal electric unit energy efficiency state evaluation method and device and coal electric unit system | |
CN105471647A (en) | Power communication network fault positioning method | |
CN111626514A (en) | Electric vehicle charging load prediction method and device | |
CN109377761A (en) | Traffic factor network establishing method based on Markov-chain model | |
CN110968703B (en) | Method and system for constructing abnormal metering point knowledge base based on LSTM end-to-end extraction algorithm | |
CN114548493A (en) | Method and system for predicting current overload of electric energy meter | |
CN105894138A (en) | Optimum weighted composite prediction method for shipment amount of manufacturing industry | |
CN113962477A (en) | Industrial electric quantity association aggregation prediction method, device, equipment and storage medium | |
CN112365082A (en) | Public energy consumption prediction method based on machine learning | |
Cai et al. | A K-nearest neighbor locally search regression algorithm for short-term traffic flow forecasting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190906 |
|
RJ01 | Rejection of invention patent application after publication |