CN110210686A - A kind of electricity charge risk model construction method of electric power big data - Google Patents

A kind of electricity charge risk model construction method of electric power big data Download PDF

Info

Publication number
CN110210686A
CN110210686A CN201910509737.9A CN201910509737A CN110210686A CN 110210686 A CN110210686 A CN 110210686A CN 201910509737 A CN201910509737 A CN 201910509737A CN 110210686 A CN110210686 A CN 110210686A
Authority
CN
China
Prior art keywords
user
arrearage
characteristic
risk
electricity charge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910509737.9A
Other languages
Chinese (zh)
Inventor
吴怀广
马江涛
尚松涛
陶红伟
胡宗山
张明星
石永生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University of Light Industry
Original Assignee
Zhengzhou University of Light Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University of Light Industry filed Critical Zhengzhou University of Light Industry
Priority to CN201910509737.9A priority Critical patent/CN110210686A/en
Publication of CN110210686A publication Critical patent/CN110210686A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention proposes a kind of electricity charge risk model construction methods of electric power big data, first, the history electricity consumption data of current power user is obtained from the marketing system inside state's net, then, according to the recall rate of history feature data, rate of precision and F1 metric simultaneously construct electricity charge risk model using logistic regression algorithm, the risk class of electricity charge risk model prediction electricity consumption user is utilized according to the threshold value I of setting and threshold value II, electricity charge risk model is verified according to the output result of electricity charge risk model and user situation of really paying the fees, and optimize the parameter of logistic regression algorithm according to verification result.The present invention constructs electricity charge risk model according to the historical information of electricity consumption user, using following one month of electricity charge risk model prediction electricity consumption user whether arrearage, and the height of the owing electricity charges risk of automatic screening electricity consumption user, marketing personnel take timely prevention and control measure to risk subscribers according to prediction result, a large amount of human and material resources resource is saved, working efficiency is effectively improved.

Description

A kind of electricity charge risk model construction method of electric power big data
Technical field
The present invention relates to a kind of electricity charge risk model building process, the electricity charge risk model structure of especially a kind of electric power big data Construction method.
Background technique
Tariff recovery work is a great system engineering, and result and power supply enterprise's management performance breath of tariff recovery cease Correlation, all the time, tariff recovery are all the key contents of power marketing.Power supply enterprise possesses very huge customer quantity, But the standing degree of each user is there is very big difference, what the standing degree of user drastically influenced its electricity charge pays shape Condition.Part Electricity customers are in order to pursue economic interests at the moment, and there are the deliberate arrears electricity charge, the feelings of occupancy power supply enterprise's fund Condition seriously affects the development of electric power enterprise electricity charge recovery of the capital and its next step.
The risk assessment of tariff recovery is mainly assessed by manually at this stage.Experienced tariff recovery personnel understand root It according to the history payment situation of electricity consumption user, is analyzed using some statistical tools, is in arrears with to judge that the user whether there is A possibility that electricity charge.This method requires tariff recovery personnel to have certain background experience, while to be familiar with different user Details, the risk situation for judging user's owing electricity charges for needing the accumulation ability of a long period relatively accurate.Thus, It is difficult to promote on a large scale.With the increase of electricity consumption user and the increasingly complexity of user power utilization environment, traditional method is Current situation can not be coped with, needs a kind of objective, automation tariff recovery methods of risk assessment, this method can be according to electricity consumption The risk situation of its owing electricity charges of the historical information automatic Evaluation of user, and the user that risk is more than certain threshold value is carried out pre- It is alert.
Summary of the invention
For the difference of tariff recovery personnel's background experience existing for traditional tariff recovery method, cause to tariff recovery risk The technical issues of otherness of estimation, the invention proposes a kind of electricity charge risk model construction methods of electric power big data, provide A kind of objective, automation tariff recovery methods of risk assessment, can give warning in advance to marketing personnel and carry out prevention and control measure, thus Make tariff recovery everything goes well with your work to carry out.
The technical scheme of the present invention is realized as follows:
A kind of electricity charge risk model construction method of electric power big data, comprising:
S1, the history feature data that current power user is obtained from the marketing system inside state's net;
S2, electricity is constructed according to the recall rate of history feature data, rate of precision and F1 metric and using logistic regression algorithm Take risk model;
S3, the risk class of electricity charge risk model prediction electricity consumption user is utilized according to the threshold value I and threshold value II of setting;
S4, situation of really being paid the fees according to the output result of electricity charge risk model and user test electricity charge risk model Card, and according to the parameter of verification result optimization logistic regression algorithm.
Preferably, the method for the history feature data acquisition of the electricity consumption user are as follows:
S11, characteristic pretreatment:
(1) characteristic clean: from state net inside marketing system obtain characteristic after, to the exception in characteristic Value and missing values are handled, and exceptional value is replaced processing with numerical value " 1 ", and missing values carry out supplement process with numerical value " 0 ";
(2) characteristic format is converted: text-type characteristic present in characteristic is converted into numeric type feature Data, and classification coding is carried out to classification type characteristic;
S12, history arrearage behavior correlation analysis:
(1) history arrearage behavior correlation analysis: collecting the history feature data of user, analyze the arrearage situation of user with The correlation of history payment;
(2) feature correlation is analyzed: the correlation between analysis different characteristic, searches whether arrearage correlation is high with user Feature;
S13, characteristic expand:
Information using electricity consumption behavior in electricity consumption user nearest 6 months expands characteristic, user power utilization behavior Information include characteristic maximum value, minimum value, mean value, variance and standard deviation;
S14, characteristic importance analysis:
After characteristic expands, characteristic is standardized, and calculates each spy using gini index Levy the characteristic importance of data.
Preferably, the building of the electricity charge risk model includes the selection of risk subscribers, the selection of characteristic, training set Selection and four part of algorithms selection implement process are as follows:
S21, risk subscribers select: selecting 2 years data as observed object, the electricity consumption gone to undue expense will not be owed in data User deletes, using the electricity consumption user for having arrearage to record as risk subscribers;
The selection of S22, characteristic: according to the characteristic importance information in step S14, successively characteristic importance is deleted most Low feature is tested, and using the recall rate of feature, rate of precision and F1 metric as measurement standard, and experimental result is best One group of feature be saved in characteristic set;
S23, training set selection: successively the electricity consumption data in the different cycles of risk of selection user is tested, and selection is real One group of best electricity consumption data of result is tested as training set;
S24, algorithms selection: from algorithm precision of prediction and Riming time of algorithm respectively to time series algorithm, neural network Algorithm, SVM algorithm, random forests algorithm and logistic regression algorithm are compared, and logistic regression algorithm is selected to remove building electricity charge wind Dangerous model.
Preferably, the method for the risk class that electricity consumption user is predicted according to electricity charge risk model are as follows:
S31, risk class divide: the probability of each arrearage risk subscribers next month is predicted according to electricity charge risk model, if It sets different threshold value I and risk subscribers is divided into high risk user, risk user and low-risk user;
S32, for the user of same risk class, a threshold value II is set, user is divided into defaulting subscriber and not arrearage User;User of the arrearage probability more than or equal to threshold value II judge the user prediction month will arrearage, arrearage probability is small Judge that the user will not arrearage in prediction month in the user of threshold value II.
Preferably, the recall rate recall in the step S22 is to predict that correct defaulting subscriber accounts for true defaulting subscriber Ratio:
The rate of precision precision is the ratio that the correct arrearage number of prediction accounts for all users for being predicted as arrearage:
The metric F1 is the harmonic-mean of recall rate recall and rate of precision precision:
Wherein, TP is true arrearage and prediction result is also the number of arrearage, and TN is true not arrearage and prediction result For the number of not arrearage, FN is true arrearage but prediction result is the number of not arrearage, and FP is true not arrearage but predicts It as a result is the number of arrearage.
Preferably, the implementation method of the logistic regression algorithm in the step S24 are as follows:
Assuming that sample is { X, y }, the value of y is 0 or 1, and y=0 indicates " electricity charge have been paid in not arrearage, i.e. user in time ", y =1 indicates " arrearage, i.e. user fail to pay the electricity charge in time ", and X is that n ties up sampling feature vectors, and X includes x1,x2,…,xn, it is assumed that Sample X belongs to negative class, then the probability of arrearage are as follows:
Wherein, g (x)=θ01x12x2+...θnxn, θ is regression coefficient, and θ includes θ01,…,θn
The value of regression coefficient θ are as follows:
S24-1, the value for initializing regression coefficient θ;
S24-2, regression coefficient θ substitution formula (1) is obtained into the output valve of electricity charge risk model
S24-3, the output valve that electricity charge risk model is calculated according to formula (2)With the error between data actual value y:
S24-4, according to formula (3) to regression coefficient θiIt is updated:
Wherein, i=1,2,3 ..., n, α are constant coefficient;
S24-5, given threshold t, judgementIt is whether true, step S24-6 is executed if set up, is otherwise returned Return step S24-2;
S24-6, output regression coefficient θiValue, according to regression coefficient θiDetermine Logic Regression Models.
It is that the present invention generates the utility model has the advantages that the present invention constructs electricity charge risk model according to the historical information of electricity consumption user, Using following one month of electricity charge risk model prediction electricity consumption user whether arrearage, and the owing electricity charges wind of automatic screening electricity consumption user The height of danger, marketing personnel take timely prevention and control measure to risk subscribers according to prediction result, save a large amount of manpower, object Power resource, effectively improves working efficiency.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is flow chart of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other under that premise of not paying creative labor Embodiment shall fall within the protection scope of the present invention.
As shown in Figure 1, a kind of electricity charge risk model construction method of electric power big data, the specific steps are as follows:
S1, the history feature data that current power user is obtained from the marketing system inside state's net.
The method of the history feature data for obtaining electricity consumption user are as follows:
S11, characteristic pretreatment:
(1) characteristic is cleaned:
From state net inside marketing system obtain characteristic after, characteristic is cleaned;First, with computer journey Sequence exports the maximum value and minimum value of each characteristic, whether in the normal range to analyze each characteristic, and judge The exceptional value and missing values of characteristic.Exceptional value is characterized characteristic of the data far beyond normal range (NR), and missing values are Characteristic is sky.The method that the exceptional value of characteristic is handled are as follows: replaced exceptional value according to business meaning.Such as In " jfjsl (payment promptness rate) " this characteristic, when the value of characteristic is " 999999 ", business need is not met, It is replaced at this time with numerical value " 1 ".Characteristic " yszb (advance offset accounting) " and " ysnumzb (advance number accounting of carrying down) " Processing method is identical with the processing method of " jfjsl (payment promptness rate) ".The method that the missing values of characteristic are handled Are as follows: it is filled with numerical value " 0 ".
(2) Data Format Transform: by text-type data conversion present in data at numeric type data, and to classification type number According to progress classification coding.
S12, correlation analysis:
Correlation analysis mainly includes that history arrearage behavior correlation analysis and feature correlation analyze two parts.
(1) history arrearage behavior correlation analysis: collecting the historical data of electricity consumption user and analyzes the arrearage of electricity consumption user The correlation that situation is paid the fees with history.Analysis method is: being directed to same electricity consumption user, calculates the Pearson came system between each month Number.The analysis found that: electricity consumption user's this month whether can generate arrearage behavior and its recent months electricity consumption behavior be have it is related Property, and with the increase of time interval, Pearson's coefficient becomes smaller, and the rule gradually successively decreased is presented in correlation.Pearson's coefficient It is for measuring linearly related strong and weak degree between two variables, the value range of Pearson's coefficient is [- 1,1], works as Pierre Two variables are positive correlation when inferior coefficient is positive number, and when Pearson's coefficient is negative, two variables are negatively correlated, Pearson came system The more big then relevant degree of several absolute values is bigger.
Wherein, n is number of samples, and X, Y are variable, Xi, YiRespectively variable X, the corresponding value of Y,Respectively Variable X, the corresponding average value of Y.
(2) feature correlation analyze: analysis different characteristic data between correlation, search with electricity consumption user whether arrearage The high feature of correlation.34 foundation characteristics are shared from the initial data taken out in Electric Power Marketing System, specific such as table 1:
1 foundation characteristic of table
Feature correlation analysis is carried out to 34 foundation characteristics, it is each according to the Pearson's coefficient analysis between different characteristic Correlation between feature, search with user whether the high feature of arrearage correlation.
S13, feature expand:
By a large amount of data statistic analysis and a series of experiment, can verify 34 features can not to user whether There may be the behaviors of arrearage accurately to be described.Therefore, on the basis of 34 foundation characteristics, one is carried out to foundation characteristic The expansion of series.
According to the history arrearage behavior correlation analysis in step S12, it is known that user's this month pay the fees situation with its first 6 months It is added in of that month data with different degrees of correlation, therefore by preceding 6 months features of user data, it is (advance with yszb Offset accounting) expansion for, concrete methods of realizing is as shown in table 2:
Table 2 is expanded according to history arrearage behavior correlation
Then 14 features in 24 features are further expanded again, maximum value including characteristic, most Small value, mean value, variance and standard deviation;By taking the expansion of yszb (advance offset accounting) as an example, concrete methods of realizing is as shown in table 3:
Table 3 is expanded according to feature correlation
14 features are respectively as follows: " yszb ", " jfjsl ", " cashchk_day_num ", " hksc ", " t_pq ", " t_pq_ Sum ", " release_day ", " charge_day ", " end_day ", " end_day2 ", " remain_daynum " " remain_ Daynum2 ", " all_daynum ", " all_daynum2 ";These features are expanded according to 3 the method for table.
Expand by feature, one is obtained 236 characteristics.For each electricity consumption user, there is every month one to be The label of no arrearage, i.e. " sfyq " in addition increase a feature " sfyq_sum (sum of preceding 6 months arrearage numbers) " again, because Final one 238 features are obtained in this.
S14, characteristic importance analysis:
After feature expands, firstly, checking the data after feature expansion with the presence or absence of exceptional value or missing values;If having different Constant value or missing values are first handled exceptional value and missing values according to the method in step S11.Secondly as each data Value range it is different, need to data standardization, data bi-directional scaling be allowed to fall into a specific sections, Data are uniformly mapped on [0,1] section.Finally, carrying out characteristic importance analysis again.Discovery has 6 during the experiment Whether missing values column, are no different constant value, it is therefore desirable to delete missing values column, utilize the method based on gini index in addition to owing later Take (' sfyq ') and 230 features except the date (' rcvbl_ym ') carry out characteristic importance analysis, calculates separately each spy The different degree of sign, the characteristic set that then preference pattern building needs are prepared for modeling in next step.
S2, electricity is constructed according to the recall rate of history feature data, rate of precision and F1 metric and using logistic regression algorithm Take risk model;
The building of the electricity charge risk model includes selection, feature selecting, training set selection and the algorithm choosing of risk subscribers Four parts are selected, process is implemented are as follows:
S21, risk subscribers select: selecting 2 years data as observed object, carry out preliminary analysis to data, will count The electricity consumption user to go to undue expense deletion is not owed in, using the electricity consumption user for thering is arrearage to record as risk subscribers.Analysis is found: sample In have 93% electricity consumption user have no precedent arrearage record, only 7% electricity consumption user had arrearage record, and every month owe The user of expense only accounts for 0.6% or so, therefore this is the unbalanced data set of distributed pole.In order to improve the accuracy of prediction, The electricity consumption user that no arrearage records is considered as devoid of risk user, deletes devoid of risk user, 7% user for having arrearage to record is regarded For risk subscribers.
S22, feature selecting: according to the characteristic importance information in step S14, the minimum spy of characteristic importance is successively deleted Sign is tested, i.e., the minimum a part of feature of different degree is deleted in experiment every time, deletes 30,50,80 and 100 respectively Feature, and using the recall rate of feature, rate of precision and F1 metric as measurement index, one group of best feature of experimental result is put Enter the characteristic set of model training.It is found by many experiments, optimal characteristic set is 232 after expanding by feature Feature (including " sfyq " and " rcvbl_ym "), does not need to be deleted again.
The recall rate recall is the ratio that the correct defaulting subscriber of prediction accounts for true defaulting subscriber:
The rate of precision precision is the ratio that the correct arrearage number of prediction accounts for all users for being predicted as arrearage:
The metric F1 is the harmonic-mean of recall rate recall and rate of precision precision:
Wherein, TP is true arrearage and prediction result is also the number of arrearage, and TN is true not arrearage and prediction result For the number of not arrearage, FN is true arrearage but prediction result is the number of not arrearage, and FP is true not arrearage but predicts It as a result is the number of arrearage.
S23, training set select: successively the electricity consumption data in the different cycles of risk of selection user is selected as testing One group of best electricity consumption data of experimental result is selected as training set;Successively selection user 3 months, 6 months, 12 months electricity consumption numbers It is tested according to as training set, is found by experiment: when selecting user 12 months electricity consumption datas as training set, experiment knot Fruit is best.
S24, algorithms selection: from algorithm precision of prediction, Riming time of algorithm and risk class division etc. difference clock synchronization Between sequence algorithm, neural network algorithm, SVM algorithm, random forests algorithm and logistic regression algorithm be compared, determination it is last Electricity charge risk model.Detailed process is as follows:
Step1: characteristic is divided into two groups, respectively training set and test set;
Step2: one of algorithm is selected, training is done to training set data, obtains trained model, then model is answered It uses on test set and predicts whether user can arrearage.
Step3: the true payment situation of user on test set and prediction result being compared, recall rate recall is obtained, essence Quasi- rate precision and F1 metric.
Step4: adjusting the parameter of above-mentioned electricity charge risk model, by many experiments, available multiple groups recall rate Recall, rate of precision precision and F1 metric, selected from multiple groups experimental result one group of optimal recall rate recall, Rate of precision precision and F1 metric is saved in the corresponding memory space of the algorithm.
Step5: selecting a kind of algorithm again, repeats step2~step4 and constructs model, then obtains one group of recall rate Recall, rate of precision precision and F1 metric are saved in its corresponding algorithm space, and successively selection algorithm is until building five A model, and obtain five groups of recall rate recall, rate of precision precision and F1 metric.
To this five groups of recall rate recall, rate of precision precision and F1 metric is compared and analyzed, by repeatedly real It tests, it is final to determine to construct electricity charge risk model using logistic regression algorithm.
By test of many times, tested using logistic regression algorithm to 231 features in addition to the date are retained, it is last true When selecting user 12 months data surely as training set, the test result of electricity charge risk model is best.
The implementation method of the logistic regression algorithm are as follows:
Assuming that sample is { X, y }, the value of y is 0 or 1, and y=0 indicates " electricity charge have been paid in not arrearage, i.e. user in time ", y =1 indicates " arrearage, i.e. user fail to pay the electricity charge in time ", and X is that n ties up sampling feature vectors, and X includes x1,x2,…,xn, it is assumed that Sample X belongs to negative class, then the probability of arrearage are as follows:
Wherein, g (x)=θ01x12x2+...θnxn, θ is regression coefficient, and θ includes θ01,…,θn
The value of regression coefficient θ are as follows:
S24-1, the value for initializing regression coefficient θ;
S24-2, regression coefficient θ substitution formula (1) is obtained into the output valve of electricity charge risk model
S24-3, according to the output valve of formula (2) computation modelWith the error between data actual value y:
S24-4, according to formula (3) to regression coefficient θiIt is updated:
Wherein, i=1,2,3 ..., n, α are constant coefficient, α=0.01;
S24-5, given threshold t=0.0001, judgementIt is whether true, step S24- is executed if set up 6, it is no to then follow the steps S24-2;
S24-6, output regression coefficient θiValue, according to regression coefficient θiDetermine Logic Regression Models.
By the modeling analysis to risk subscribers, determining regression coefficient and its corresponding feature name are as follows:
S3, the risk class of electricity charge risk model prediction electricity consumption user is utilized according to the threshold value I and threshold value II of setting;
The method of the risk class that electricity consumption user is predicted according to electricity charge risk model are as follows:
S31, risk class divide: according to the output of electricity charge risk model the result is that arrearage behavior occurs for each risk subscribers Probability, different threshold value I is set by risk subscribers and is divided into high risk user, risk user and low-risk user.
It is defined as follows:
Arrearage probability >=0.7 is high risk user;
0.7 > arrearage probability >=0.4 is risk user;
0.4 > arrearage probability >=0 is low-risk user.
S32, for the user of same risk class, a threshold value II is set, user is divided into defaulting subscriber and not arrearage User;User of the arrearage probability more than or equal to threshold value II judge the user prediction month will arrearage, arrearage probability is small Judge that the user will not arrearage in prediction month in the user of threshold value II.Such as: the risk class in step S31 divides In, the electricity consumption user of arrearage probability >=0.7 is defined as high risk user, electricity consumption user has only been done a wind by this process The judgement of dangerous grade, do not predict electricity consumption user whether arrearage, it is therefore desirable to suitable threshold value is set by the use of the grade Family is divided into two classes of arrearage and not arrearage as final prediction result.
The threshold value II for determining whether to be arranged when arrearage is as follows:
High risk user: 0.7 (i.e. arrearage probability > 0.7 when, determine high risk user next month will arrearage, otherwise do not owe Take);
Risk user: 0.4267 (i.e. arrearage probability > 0.4267 when, determine risk user next month will arrearage, it is no Then not arrearage);
Low-risk user: 0.25 (i.e. arrearage probability > 0.25 when, determine low-risk user next month will arrearage, otherwise not Arrearage).
By taking high risk user as an example, the selection method of threshold value II are as follows:
Step1: threshold value II recycles repeatedly between 0.7~1, and step-length 0.01, circulation all calculates a high risk every time Recall rate recall, rate of precision precision and the F1 metric of user selects best primary of effect, high wind can be obtained The threshold value II of dangerous user.For example threshold value II=0.7 is taken when circulation for the first time, by arrearage probability > 0.7 in high risk user User is considered as arrearage, and the user of arrearage probability≤0.7 in high risk user is considered as not arrearage, calculates high risk user at this time Recall rate recall, rate of precision precision and F1 metric.Successively take threshold value II=0.71 later, 0.72, 0.73 ..., 1, repeatedly after circulation, available multiple groups recall rate recall, rate of precision precision and F1 metric, selection As a result the corresponding threshold value II of best one cycle.
Step2: doing and test three times, i.e., predicts three months respectively, all available judgement of the high risk user of every month Threshold value;
Step3: averaging is carried out to three decision thresholds of high risk user, average decision threshold is calculated, as institute The corresponding threshold value II of the high risk user asked, that is, 0.7.
The decision threshold of risk user and low-risk user can be used step step1~step3 operating method and acquires.
S4, situation of really being paid the fees according to the output result of electricity charge risk model and user test electricity charge risk model Card, and the regular terms and penalty factor that optimize logistic regression algorithm according to verification result are to improve recall rate recall, rate of precision Precision and F1 metric can determine an optimal electricity charge risk model, then utilize optimal electricity charge risk model Carrying out prediction to the arrearage situation in electricity consumption user multiple months proves the stability of model.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (6)

1. a kind of electricity charge risk model construction method of electric power big data characterized by comprising
S1, the history feature data that current power user is obtained from the marketing system inside state's net;
S2, electricity charge wind is constructed according to the recall rate of history feature data, rate of precision and F1 metric and using logistic regression algorithm Dangerous model;
S3, the risk class of electricity charge risk model prediction electricity consumption user is utilized according to the threshold value I and threshold value II of setting;
S4, situation of really being paid the fees according to the output result of electricity charge risk model and user verify electricity charge risk model, and Optimize the parameter of logistic regression algorithm according to verification result.
2. the electricity charge risk model construction method of electric power big data according to claim 1, which is characterized in that the electricity consumption The method of the history feature data acquisition of user are as follows:
S11, characteristic pretreatment:
(1) characteristic is cleaned: after the marketing system internal from state's net obtains characteristic, in characteristic exceptional value with Missing values are handled, and exceptional value is replaced processing with numerical value " 1 ", and missing values carry out supplement process with numerical value " 0 ";
(2) characteristic format is converted: text-type characteristic present in characteristic is converted into numeric type characteristic, And classification coding is carried out to classification type characteristic;
S12, history arrearage behavior correlation analysis:
(1) history arrearage behavior correlation analysis: collecting the history feature data of user, analyzes the arrearage situation and history of user The correlation of payment;
(2) feature correlation analyze: analysis different characteristic between correlation, search with user whether the high spy of arrearage correlation Sign;
S13, characteristic expand:
Information using electricity consumption behavior in electricity consumption user nearest 6 months expands characteristic, the letter of user power utilization behavior Breath includes maximum value, minimum value, mean value, variance and the standard deviation of characteristic;
S14, characteristic importance analysis:
After characteristic expands, characteristic is standardized, and calculates each characteristic using gini index According to characteristic importance.
3. the electricity charge risk model construction method of electric power big data according to claim 1 or 2, which is characterized in that described The building of electricity charge risk model includes the selection of risk subscribers, the selection of characteristic, training set selection and algorithms selection four Point, implement process are as follows:
S21, risk subscribers select: selecting 2 years data as observed object, the electricity consumption user to go to undue expense will not be owed in data It deletes, using the electricity consumption user for thering is arrearage to record as risk subscribers;
The selection of S22, characteristic: according to the characteristic importance information in step S14, it is minimum successively to delete characteristic importance Feature is tested, and using the recall rate of feature, rate of precision and F1 metric as measurement standard, by experimental result it is best one Group feature is saved in characteristic set;
S23, training set selection: successively the electricity consumption data in the different cycles of risk of selection user is tested, choice experiment knot One group of best electricity consumption data of fruit is as training set;
S24, algorithms selection: from algorithm precision of prediction and Riming time of algorithm respectively to time series algorithm, neural network algorithm, SVM algorithm, random forests algorithm and logistic regression algorithm are compared, and logistic regression algorithm is selected to remove building electricity charge risk mould Type.
4. the electricity charge risk model construction method of electric power big data according to claim 1, which is characterized in that the basis The method that electricity charge risk model predicts the risk class of electricity consumption user are as follows:
S31, risk class divide: the probability of each arrearage risk subscribers next month are predicted according to electricity charge risk model, setting is not Risk subscribers are divided into high risk user, risk user and low-risk user by same threshold value I;
S32, for the user of same risk class, a threshold value II is set, user is divided into defaulting subscriber and not defaulting subscriber; Arrearage probability more than or equal to threshold value II user i.e. judge the user prediction month will arrearage, arrearage probability be less than threshold value The user of II judges that the user will not arrearage in prediction month.
5. the electricity charge risk model construction method of electric power big data according to claim 3, which is characterized in that the step Recall rate recall in S22 is the ratio that the correct defaulting subscriber of prediction accounts for true defaulting subscriber:
The rate of precision precision is the ratio that the correct arrearage number of prediction accounts for all users for being predicted as arrearage:
The metric F1 is the harmonic-mean of recall rate recall and rate of precision precision:
Wherein, TP is true arrearage and prediction result is also the number of arrearage, and TN is true not arrearage and prediction result is also not The number of arrearage, FN is true arrearage but prediction result is the number of not arrearage, and FP is true not arrearage but prediction result For the number of arrearage.
6. the electricity charge risk model construction method of electric power big data according to claim 5, which is characterized in that the step The implementation method of logistic regression algorithm in S24 are as follows:
Assuming that sample is { X, y }, the value of y is 0 or 1, and y=0 indicates " electricity charge have been paid in not arrearage, i.e. user in time ", y=1 It indicates " arrearage, i.e. user fail to pay the electricity charge in time ", X is that n ties up sampling feature vectors, and X includes x1,x2,…,xn, it is assumed that sample X belongs to negative class, then the probability of arrearage are as follows:
Wherein, g (x)=θ01x12x2+...θnxn, θ is regression coefficient, and θ includes θ01,…,θn
The value of regression coefficient θ are as follows:
S24-1, the value for initializing regression coefficient θ;
S24-2, regression coefficient θ substitution formula (1) is obtained into the output valve of electricity charge risk model
S24-3, the output valve that electricity charge risk model is calculated according to formula (2)With the error between data actual value y:
S24-4, according to formula (3) to regression coefficient θiIt is updated:
Wherein, i=1,2,3 ..., n, α are constant coefficient;
S24-5, given threshold t, judgementIt is whether true, step S24-6 is executed if set up, otherwise returns to step Rapid S24-2;
S24-6, output regression coefficient θiValue, according to regression coefficient θiDetermine Logic Regression Models.
CN201910509737.9A 2019-06-13 2019-06-13 A kind of electricity charge risk model construction method of electric power big data Pending CN110210686A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910509737.9A CN110210686A (en) 2019-06-13 2019-06-13 A kind of electricity charge risk model construction method of electric power big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910509737.9A CN110210686A (en) 2019-06-13 2019-06-13 A kind of electricity charge risk model construction method of electric power big data

Publications (1)

Publication Number Publication Date
CN110210686A true CN110210686A (en) 2019-09-06

Family

ID=67792298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910509737.9A Pending CN110210686A (en) 2019-06-13 2019-06-13 A kind of electricity charge risk model construction method of electric power big data

Country Status (1)

Country Link
CN (1) CN110210686A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111198907A (en) * 2019-12-24 2020-05-26 深圳供电局有限公司 Method and device for identifying potential defaulting user, computer equipment and storage medium
CN111310785A (en) * 2020-01-15 2020-06-19 杭州华网信息技术有限公司 National power grid mechanical external damage prediction method
CN111340375A (en) * 2020-02-28 2020-06-26 创新奇智(上海)科技有限公司 Electricity charge recycling risk prediction method and device, electronic equipment and storage medium
CN111639883A (en) * 2020-06-15 2020-09-08 江苏电力信息技术有限公司 Electricity charge recycling risk prediction method based on machine learning
CN112036682A (en) * 2020-07-10 2020-12-04 广西电网有限责任公司 Early warning method and device for frequent power failure
CN112614342A (en) * 2020-12-10 2021-04-06 大唐高鸿数据网络技术股份有限公司 Early warning method for road abnormal event, vehicle-mounted equipment and road side equipment
CN112801709A (en) * 2021-02-05 2021-05-14 杭州拼便宜网络科技有限公司 User loss prediction method, device, equipment and storage medium
CN112948367A (en) * 2021-03-24 2021-06-11 国网浙江省电力有限公司物资分公司 Data cleaning system for power material configuration demand measurement and calculation
CN113256008A (en) * 2021-05-31 2021-08-13 国家电网有限公司大数据中心 Arrearage risk level determination method, device, equipment and storage medium
CN113554454A (en) * 2021-06-30 2021-10-26 西安图迹信息科技有限公司 Big data is sold system with electric power and equipment thereof
CN113592140A (en) * 2021-06-22 2021-11-02 国网宁夏电力有限公司吴忠供电公司 Electric charge payment prediction model training system and electric charge payment prediction model
CN114374618A (en) * 2021-12-24 2022-04-19 中国电信股份有限公司 Training method, user arrearage off-network prediction method and device
CN115099478A (en) * 2022-06-17 2022-09-23 国网数字科技控股有限公司 User electricity consumption behavior prediction method and device, electronic equipment and storage medium
CN115905319A (en) * 2022-11-16 2023-04-04 国网山东省电力公司营销服务中心(计量中心) Automatic identification method and system for abnormal electricity charges of massive users
CN117151768A (en) * 2023-10-30 2023-12-01 国网浙江省电力有限公司营销服务中心 Construction method and system of wind control rule base of generated marketing event

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030023466A1 (en) * 2001-07-27 2003-01-30 Harper Charles N. Decision support system and method
CN106251049A (en) * 2016-07-25 2016-12-21 国网浙江省电力公司宁波供电公司 A kind of electricity charge risk model construction method of big data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030023466A1 (en) * 2001-07-27 2003-01-30 Harper Charles N. Decision support system and method
CN106251049A (en) * 2016-07-25 2016-12-21 国网浙江省电力公司宁波供电公司 A kind of electricity charge risk model construction method of big data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
厉建宾等: ""基于大数据分析的客户电费风险预测及防控"", 《电力大数据》 *
李晓蕾等: ""基于改进随机森林的电力用户欠费风险分析预警"", 《电测与仪表》 *
赵少东等: ""基于熵值法的电力客户敏感度综合评价模型研究"", 《电工技术》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111198907A (en) * 2019-12-24 2020-05-26 深圳供电局有限公司 Method and device for identifying potential defaulting user, computer equipment and storage medium
CN111310785A (en) * 2020-01-15 2020-06-19 杭州华网信息技术有限公司 National power grid mechanical external damage prediction method
CN111340375A (en) * 2020-02-28 2020-06-26 创新奇智(上海)科技有限公司 Electricity charge recycling risk prediction method and device, electronic equipment and storage medium
CN111639883A (en) * 2020-06-15 2020-09-08 江苏电力信息技术有限公司 Electricity charge recycling risk prediction method based on machine learning
CN112036682A (en) * 2020-07-10 2020-12-04 广西电网有限责任公司 Early warning method and device for frequent power failure
CN112614342A (en) * 2020-12-10 2021-04-06 大唐高鸿数据网络技术股份有限公司 Early warning method for road abnormal event, vehicle-mounted equipment and road side equipment
CN112801709A (en) * 2021-02-05 2021-05-14 杭州拼便宜网络科技有限公司 User loss prediction method, device, equipment and storage medium
CN112948367A (en) * 2021-03-24 2021-06-11 国网浙江省电力有限公司物资分公司 Data cleaning system for power material configuration demand measurement and calculation
CN113256008A (en) * 2021-05-31 2021-08-13 国家电网有限公司大数据中心 Arrearage risk level determination method, device, equipment and storage medium
CN113592140A (en) * 2021-06-22 2021-11-02 国网宁夏电力有限公司吴忠供电公司 Electric charge payment prediction model training system and electric charge payment prediction model
CN113554454A (en) * 2021-06-30 2021-10-26 西安图迹信息科技有限公司 Big data is sold system with electric power and equipment thereof
CN114374618A (en) * 2021-12-24 2022-04-19 中国电信股份有限公司 Training method, user arrearage off-network prediction method and device
CN115099478A (en) * 2022-06-17 2022-09-23 国网数字科技控股有限公司 User electricity consumption behavior prediction method and device, electronic equipment and storage medium
CN115905319A (en) * 2022-11-16 2023-04-04 国网山东省电力公司营销服务中心(计量中心) Automatic identification method and system for abnormal electricity charges of massive users
CN115905319B (en) * 2022-11-16 2024-04-19 国网山东省电力公司营销服务中心(计量中心) Automatic identification method and system for abnormal electricity fees of massive users
CN117151768A (en) * 2023-10-30 2023-12-01 国网浙江省电力有限公司营销服务中心 Construction method and system of wind control rule base of generated marketing event

Similar Documents

Publication Publication Date Title
CN110210686A (en) A kind of electricity charge risk model construction method of electric power big data
CN111104981B (en) Hydrological prediction precision evaluation method and system based on machine learning
CN112686464A (en) Short-term wind power prediction method and device
CN108051035B (en) The pipe network model recognition methods of neural network model based on gating cycle unit
CN112069567A (en) Method for predicting compressive strength of concrete based on random forest and intelligent algorithm
JP2023529760A (en) Early Warning Analysis Method of Expiration of Reservoir Scheduling Rules under the Impact of Climate Change
CN104572449A (en) Automatic test method based on case library
CN105701596A (en) Method for lean distribution network emergency maintenance and management system based on big data technology
CN103488869A (en) Wind power generation short-term load forecast method of least squares support vector machine
CN107426759A (en) The Forecasting Methodology and system of newly-increased base station data portfolio
CN110059845B (en) Metering device clock error trend prediction method based on time sequence evolution gene model
CN103530527A (en) Wind power probability forecasting method based on numerical weather forecasting ensemble forecasting results
CN109190907A (en) The small micro- power honesty risk index construction method of power supply station based on big data
CN114290960A (en) Method and device for acquiring battery health degree of power battery and vehicle
CN105406461A (en) Adaptive dynamic load monitoring method for power distribution network power failure events
CN112836920A (en) Coal electric unit energy efficiency state evaluation method and device and coal electric unit system
CN105471647A (en) Power communication network fault positioning method
CN111626514A (en) Electric vehicle charging load prediction method and device
CN109377761A (en) Traffic factor network establishing method based on Markov-chain model
CN110968703B (en) Method and system for constructing abnormal metering point knowledge base based on LSTM end-to-end extraction algorithm
CN114548493A (en) Method and system for predicting current overload of electric energy meter
CN105894138A (en) Optimum weighted composite prediction method for shipment amount of manufacturing industry
CN113962477A (en) Industrial electric quantity association aggregation prediction method, device, equipment and storage medium
CN112365082A (en) Public energy consumption prediction method based on machine learning
Cai et al. A K-nearest neighbor locally search regression algorithm for short-term traffic flow forecasting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190906

RJ01 Rejection of invention patent application after publication