CN106251049A - A kind of electricity charge risk model construction method of big data - Google Patents

A kind of electricity charge risk model construction method of big data Download PDF

Info

Publication number
CN106251049A
CN106251049A CN201610587762.5A CN201610587762A CN106251049A CN 106251049 A CN106251049 A CN 106251049A CN 201610587762 A CN201610587762 A CN 201610587762A CN 106251049 A CN106251049 A CN 106251049A
Authority
CN
China
Prior art keywords
risk
data
variable
value
electricity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610587762.5A
Other languages
Chinese (zh)
Inventor
罗飞鹏
涂莹
欧阳柳
林森
金慧颂
周斌
马德荣
王庆娟
龙正雄
王海波
柯方圆
卢姗姗
孔旭锋
林士勇
吴亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Zhejiang Electric Power Co Ltd
Ningbo Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Zhejiang Electric Power Co Ltd
Ningbo Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Zhejiang Electric Power Co Ltd, Ningbo Power Supply Co of State Grid Zhejiang Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201610587762.5A priority Critical patent/CN106251049A/en
Publication of CN106251049A publication Critical patent/CN106251049A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The electricity charge risk model construction method of a kind of big data, relates to a kind of risk management and control method.At present, pressing for payment of of the electricity charge is to carry out by unified frequency and unified approach;On the one hand there is number of times, the too much problem of frequency pressed for payment of, on the other hand there is the problem that payment can not complete target.The present invention comprises the following steps: one) data prepare, two) index system establishment, three) correlation analysis, four) model construction, five) model output, six) electricity charge risk difference process.The technical program focuses on excessive risk user, promotes the specific aim urging expense.For low-risk user, can suitably reduce prompting, notice link, reduce and urge expense frequency, preferentially promote e bill;Emphasis carries out collection work for excessive risk user, promotes the frequency urging expense;Effectively optimize and pooling of resources, improve the success rate of collection.

Description

A kind of electricity charge risk model construction method of big data
Technical field
The present invention relates to a kind of electricity charge risk model construction method, particularly relate to the electricity charge risk model of a kind of big data Construction method.
Background technology
At present, pressing for payment of of the electricity charge is to carry out by unified frequency and unified approach;The client that the electricity charge are paid the fees on time and existence electricity Expense pays the client of risk, presses for payment of mode, frequency identical, on the one hand there is number of times, the too much problem of frequency pressed for payment of, on the other hand deposit The problem that can not complete target in payment, wastes resource, increases risk.
Summary of the invention
The technical assignment of the technical problem to be solved in the present invention and proposition is to carry out prior art improving and improving, There is provided the electricity charge risk model construction method of a kind of big data, to reduce workload and the purpose reduced risks.To this end, this Techniques below scheme is taked in invention.
The electricity charge risk model construction method of a kind of big data comprises the following steps:
One) data prepare:
1) data acquisition: according to the internal marketing system data of state's net, electricity consumption acquisition system data, collection customer basis information, Paying information, promise breaking information, illegal information, electricity consumption tendency information;According to external system data, collect outside credit information, OK Industry foreground information is evaluated, production and operation information data;
2) data detection: the data obtained are tested, including: A, ID be uniqueness, check each ID variable The most only occur once, if occurring repeatedly, then verify reason and adjust data;B, scope and value: whether check each variable Be that definition is clear, there is the known or field of expection span, when data are continuous variable, its value set pre- In the range of phase, when data are nominal variable, it is the value in dimension table;C, missing values: check whether each field exists disappearance Value and source thereof are the most complete, if there is missing values, then analyze the reason that missing values occurs, and according to reason, enter missing values Row processes;D, exceptional value: the observation of inspection data whether bias data collection, think this number when the observation of bias data collection According to for exceptional value, the reason that inspection exceptional value occurs, and process exceptional value accordingly;
3) data process: data process and include being carried out missing values, outlier, the record of exceptional value with related derivative variable Generation;
Two) index system establishment:
With Electricity customers data as sample;Analyze each dimension data information having arrearage record client, extract many and arrearage wind The variable may being correlated with in danger;Variable includes: produce penalty number of times, average returned money duration, the end of month pay the fees number of times, exceed the time limit and pay dues Rate, whether exist exceed the time limit continuously, the duration that exceeds the time limit, business change number of times, the amount of money that should collect charges for electricity the most steady;
Three) correlation analysis:
To variable, including original variable and derivative variable, carry out correlation analysis, weigh the dependency between variable;Work as phase relation When number is more than setting value, it is believed that two variable height correlations, then delete a wherein variable;
Single argument is analyzed, including the association analysis between explanatory variable and explained variable and chi-square analysis;
Four) model construction:
Produce penalty number of times according to variable, whether average returned money duration, the end of month pay the fees number of times, the rate of paying dues of exceeding the time limit, exist continuously Exceed the time limit, the duration that exceeds the time limit, business change number of times, the amount of money that should collect charges for electricity whether steady, to high pressure client, low pressure is non-occupies client, resident Client carries out the electricity charge Risk Modeling of correspondence, and calculates, according to model, the probability that electricity charge risk occurs;
Five) model output:
1) Electricity customers risk class classification:
The probability that electricity charge risk occurs is calculated, according to client's arrearage state, by Electricity customers risk class according to the model built It is divided into potential risk and true risk two class:
A, potential risk: the electricity charge of being settled when model calculation, simultaneously according to the Electricity customers of electricity charge risk evaluation model output Assessment result, is divided into potential excessive risk, potential risk and potential low-risk Three Estate;
B) true risk: not yet close or still pay for first record during model calculation, according to the wind of electricity charge risk evaluation model output Danger grade, is divided into true excessive risk, true risk and true low-risk Three Estate;
2) Electricity customers risk trend is analyzed:
Record each Risk Variation direction investigating period, with the situation of paying dues of dynamic reflection client all sidedly:
Six) electricity charge risk difference processes:
Export result according to model, take tariff recovery strategy and the preventive measure of differentiation in advance, shorten the tariff recovery cycle, Control business risk, for low-risk user, reduce prompting, notice link, reduce and urge expense frequency, preferentially promote e bill; Excessive risk user is carried out collection work, promotes the frequency urging expense.
Electricity charge risk evaluation model is based on data such as customer electricity direct action and correlation behaviors, utilizes business personnel to adjust Grind, basic unit's interview, the method such as rule induction, carry out risk analysis of paying dues, Default Risk, illegal risk analysis, electricity consumption become The work such as potential analysis, outside credit information evaluation, the evaluation of industry foreground information, production and operation information evaluation.Utilize " a storehouse simultaneously Data process, fundamental analysis and the advanced analysis module at three " center " mathematical analysis centers, uses data digging method, exports thing Real risk subscribers and potential risk user, give " excessive risk ", " risk ", " low-risk " three grades of electricity charge risk identification marks respectively Signing and risk trend label, support is carried out differentiation and is not paid the work of collection Study on Measures, ultimately forms the collection strategy of diversification.
Focus on excessive risk user, promote the specific aim urging expense.For low-risk user, can suitably reduce prompting, notice Link, reduces and urges expense frequency, preferentially promote e bill;Emphasis carries out collection work for excessive risk user, promotes and urges expense Frequency, such as reminding short message were once brought up to 1 day once by 2 days.Effectively optimize and pooling of resources, improve the success of collection Rate.
As improving further and supplementing technique scheme, present invention additionally comprises following additional technical feature.
In step 5) carry out modelling verification before model output, according to electricity charge risk evaluation model, Electricity customers is carried out pre- Test and appraisal are estimated, and result and actual arrearage result are carried out contrast verification, including analyzing hit rate, coverage rate and the change of lifting degree Trend, and model is done the process of corresponding tuning;Wherein, hit rate: hit rate=predict correct sum/forecasting risk client's number, The ratio of correct result in descriptive model result, this index is for weighing the accuracy of model;Coverage rate: coverage rate=prediction is just Really sum/actual arrearage sum, defaulting subscriber's quantity that descriptive model is excavated accounts for the ratio of true arrearage amount;Promote Degree: the hit rate of model prediction and the ratio of the hit rate of random screening, is the reference standard weighing model validation.
Outlier is the numerical value beyond corresponding positive and negative 3 standard deviations of average of variable, and exceptional value is the sight of bias data collection Measured value, outlier, the processing method of exceptional value include: outlier, exceptional value are adjusted to closest normal value;Directly pick Except outlier or exceptional value;Outlier or exceptional value is substituted by null value NULL;
When data are exceptional value, the reason that inspection exceptional value occurs, and be correspondingly processed;If outlier or exceptional value without Business implication, the most directly rejects outlier or exceptional value or substitutes outlier or exceptional value by null value NULL.
The processing method of missing values includes: missing values is adjusted to fixed value;Missing values is adjusted to one and obeys normal state The random value of distribution.
In step 4) in model construction:
For high pressure customer modeling, the electricity charge Risk Calculation equation determined is:
Occupying customer modeling for low pressure is non-, the electricity charge Risk Calculation equation determined is:
Modeling for residential customers, the electricity charge Risk Calculation equation determined is:
If the probability that y occurs is p, then the probability that electricity charge risk occurs is:
Wherein:For long pointer during returned money,For produce penalty number of times index,For number of times index of paying dues the end of month,For exceeding Phase pay dues rate index,For index of exceeding the time limit the most continuously,Long pointer during for exceeding the time limit,For business change number of times index,For Should be collected charges for electricity the most steadily index;Wherein, when several variablees are forward to the combined influence power of arrears risk, due to magnitude Difference, when it simultaneously enters model, the coefficient of Partial Variable becomes negative value;P is the general of Electricity customers generation arrears risk Rate.
Beneficial effect: the technical program constructs respectively for high voltage customer, the non-resident user of low pressure, low pressure resident Model.Reduce tariff recovery risk, lifting tariff recovery rate provides data supporting, shortens the tariff recovery cycle, reduces and manage Risk.
Accompanying drawing explanation
Fig. 1 is flow chart of the present invention.
Fig. 2 is the graph of a relation of the arrearage number of households involved and returned money duration.
Detailed description of the invention
Below in conjunction with Figure of description, technical scheme is described in further detail.
Electricity charge risk evaluation model is based on data such as customer electricity direct action and correlation behaviors, utilizes business personnel to adjust Grind, basic unit's interview, the method such as rule induction, carry out risk analysis of paying dues, Default Risk, illegal risk analysis, electricity consumption become The work such as potential analysis, outside credit information evaluation, the evaluation of industry foreground information, production and operation information evaluation.Utilize " a storehouse simultaneously Data process, fundamental analysis and the advanced analysis module at three " center " mathematical analysis centers, uses data digging method, exports thing Real risk subscribers and potential risk user, give " excessive risk ", " risk ", " low-risk " three grades of electricity charge risk identification marks respectively Signing and risk trend label, support is carried out differentiation and is not paid the work of collection Study on Measures, ultimately forms the collection strategy of diversification.
As it is shown in figure 1, the present invention comprises the following steps:
One, data prepare
2.2.1 data acquisition
Utilize state net existing marketing system data and use extraction system data, collect customer basis information, paying information, promise breaking respectively Information, illegal information, electricity consumption tendency information.Utilize business personnel to investigate, basic unit's interview and other external system data, outside collection Portion's credit information, industry foreground information are evaluated, production and operation information data.Specifically include: (1) base attribute: Customs Assigned Number, family Name, user's classification, trade classification, capacity etc.;(2) pay dues behavior: electricity charge issue date, paid date, should collect charges for electricity, paid electricity Take, way to pay dues etc.;(3) electricity consumption behavior: promise breaking electricity consumption historical record, illegal electricity consumption historical record, power consumption historical record, letter With evaluation history record etc.;(4) related information: outside credit information, industry foreground information are evaluated, production and operation information data.
2.2.2 data detection
After obtaining data, the first reply quality of data is tested, including: the uniqueness of (1) ID: modeling basic data collection In, each user is observation data (observation), and once, otherwise needing should only occur in the most each ID variable Reason to be verified, adjusts data;(2) scope and value: each variable that modeling data is concentrated use in should be fixed from Justice is clear, it is known to have or the field of expection span.The value of continuous variable should be in certain desired extent, and name Justice variable should take the value in dimension table;(3) missing values: the fact that missing values is the most indisputable, therefore, identifies modeling number Missing values and source thereof according to each field of concentration are the basic steps in integrity check.The generation of missing values is probably error Result, it is also possible to be because the value that is not defined of field formulated.(4) exceptional value: exceptional value refers to deviate considerably from data The observation of collection, such as data such as excessive, too small, negative values.Exceptional value is likely due to what misregistration caused, it is also possible to be really Data.The reason that exceptional value to be checked occurs, and process exceptional value accordingly.
2.2.3 data process
Data process mainly comprise missing values, outlier, the record of exceptional value are carried out with the generation of related derivative variable.
(1) processing method of outlier, exceptional value:
Outlier, exceptional value are adjusted to closest normal value by l.Such as, if outlier is defined as 3 standard deviations In addition, then the maximum or the minima that can use 3 standard deviations are replaced.
L directly rejects outlier or exceptional value.
L null value NULL substitutes outlier or exceptional value.
(2) processing method of missing values:
Missing values is adjusted to certain fixed value by l.Such as average, intermediate value or a constant specified.
Missing values is adjusted to the random value of a Normal Distribution by l.
(3) generation of related derivative variable:
● based on " the mathematical analysis center of storehouse three " center " utilizes the variable computing function of data processing module to generate arrearage The each derivative variable that risk theme is relevant.
Two, index system establishment
Choosing Ninghai County of Ningbo City Electricity customers data is each dimension data that sample, first selective analysis have arrearage record client Information, and based on " the fundamental analysis module at a " center " mathematical analysis center, storehouse three carries out correlation analysis, extracts many and arrearage The variable that risk may be correlated with.Through correlation analysis, the variable higher with arrears risk degree of correlation is: generation penalty number of times, Pay the fees at average returned money duration, the end of month number of times, the rate of paying dues of exceeding the time limit, whether exist exceed the time limit continuously, the duration that exceeds the time limit, business change number of times, The amount of money that should collect charges for electricity is the most steady.The specific explanations of these variablees is as follows.
(1) penalty number of times is produced.Electricity customers is paid dues the most in time in probation endogenous cause of ill and is produced penalty
Number of times, the number of times arrears risk the most at most that Electricity customers produces penalty is the biggest;(2) average returned money duration.The i.e. electricity charge are real Receiving the time difference between date and issuing date, the shorter subscriber arrearage risk of returned money duration is less, whereas larger;(3) the end of month Pay dues number of times.The number of times that Electricity customers is paid dues after No. 25 in probation, when the arrearage situation of the user that pays dues the end of month is compared with other Between the user that pays dues of section serious, therefore number of times arrears risk the most at most of paying dues the end of month of Electricity customers is the biggest;(4) exceed the time limit the rate of paying dues. Exceeding the time limit, the rate of paying dues=exceed the time limit number of times/total degree of paying dues of paying dues, the rate of paying dues of exceeding the time limit and arrears risk probability size correlation; (5) whether exist and exceed the time limit continuously.Within the most continuous three months, exist to exceed the time limit at probation Electricity customers and pay dues, find nearly three through analyzing The electricity consumption user that individual month exceeds the time limit continuously and the Electricity customers arrearage ratio exceeded the time limit for discontinuous three months have notable difference;(6) exceed the time limit Duration.Date during (or penalty date of start of calculation) and the electricity charge paid day of i.e. paying dues the deadline is poor, and the duration that exceeds the time limit is the shortest, electricity consumption Risk class is the lowest for the fact that client;(7) business change number of times.Business change refers to handle transfer, new clothes, change class, time-out Etc. business, handling the client of change business, its arrears risk probability is the most higher;(7) amount of money that should collect charges for electricity the most steadily is led If investigating the amount of money variation tendency that should collect charges for electricity of client, the amount of money that should collect charges for electricity the most then risk class is low, and should collect charges for electricity the amount of money Unstable then electricity charge risk class is the highest.
Three, correlation analysis
Variable to " waiting to model ", including original variable and derivative variable, carries out correlation analysis, weighs being correlated with between variable Property.It is said that in general, correlation coefficient > 0.8 time, two variable height correlations, need to delete one of them, through correlation analysis and Micro-judgment, can delete a part of variable.
Before being modeled analyzing, it is generally required to single argument is analyzed, mainly include between explained variable Association analysis and chi-square analysis.To determine whether a certain variable can be used for modeling, and the need of carrying out this variable turning Change.
As in figure 2 it is shown, in January, 2015 arrearage in June ,-2015 number of households involved and the relation of returned money duration:
Finding through analyzing: the packet that average returned money duration is the biggest, its arrearage ratio is the highest, and arrearage the most once occurs, and sends out after it The probability of raw arrearage is bigger.
Value of information IV is equally used to weigh the relatedness between explanatory variable and explained variable, association analysis Result IV value > 0.3 time, between interpretation variable and explained variable, there is High relevancy.Produce penalty number of times and target Value of information IV between variable is as follows:
The electricity charge risk model construction method of a kind of big data of variable The IV value of each variable of high pressure client model The non-IV value occupying each variable of client model The IV value of each variable of residential customers model
Produce penalty number of times 0.5618 0.5304 0.5710
Produce value of information IV between penalty number of times index and target variable (whether being arrearage client) as can be seen from the above table equal More than 0.3, illustrate to have between the two High relevancy, i.e. produce penalty number of times index and can include model in.
Four, model construction
Arrears risk research is mainly based upon the information datas such as client's essential information, paying information, power information, utilizes classification to calculate To client, whether arrearage is predicted method.The most common sorting algorithm has: logistic, decision tree, neutral net, KNN, SVM, naive Bayesian 6 class.Due to power customer substantial amounts, the data volume such as client's essential information, paying information is very big, because of This Risk Modeling pays the utmost attention to simple and parallel processing speeds algorithm faster.Simultaneously taking account of will be by arrears risk division etc. Level, therefore Risk Modeling pays the utmost attention to export what result was prone to explain, and the algorithm of easy divided rank.For this based on above-mentioned two Put and combine conventional historical experience, using logistic algorithm to the arrears risk research segmented market is preferential.Wherein Logistic simulated target is to predict the probability that client is arrearage this event of client, arrearage probability is converted into risk simultaneously and comments Point, the biggest then risk score of probability is the highest, and arrears risk the highest grade;The least then risk score of probability is the lowest, arrears risk etc. Level is the lowest.This algorithm can pass through " in regression analysis two in the advanced analysis module at the mathematical analysis center of storehouse three " center " Value logistic regression function key realizes.
From the beginning of three variablees, use method of gradual regression, calculate the C value of 3 regression equations arriving N number of variable respectively, select C Value (C value refers to the area of below ROC curve, it is however generally that, think that regression equation is effective when C value is more than 0.75.C value is the biggest, returns Equation is the most reliable) maximum regression equation is as Optimality equations.After determining regression variable, substitute into logistic regression equation, draw each The coefficient of variable.The equation finally determined is:
For high pressure customer modeling, the equation finally determined is:
Occupying customer modeling for low pressure is non-, the equation finally determined is:
Modeling for residential customers, the equation finally determined is:
If the probability that y occurs is p, then:
For long pointer during returned money,For produce penalty number of times index,For number of times index of paying dues the end of month,For the friendship that exceeds the time limit Rate index,For index of exceeding the time limit the most continuously,Long pointer during for exceeding the time limit,For business change number of times index,For receivable The electricity charge the most steadily index.Wherein, when several variablees are forward to the combined influence power of arrears risk, due to the difference of magnitude Different, when it simultaneously enters model, the coefficient of Partial Variable becomes negative value.P is the probability of Electricity customers generation arrears risk.
Five, model verification method
By " two-valued function in a storehouse three " center " mathematical analysis center regression analysis returns function key, it is achieved to client's arrearage Analysis modeling.According to electricity charge risk evaluation model, Electricity customers is predicted assessment, and result is entered with actual arrearage result Row contrast verification, Main Analysis hit rate, coverage rate and the variation tendency of lifting degree, and model is done the process of corresponding tuning.
(1) hit rate: hit rate=sum (prediction is correct)/forecasting risk client's number, correct result in descriptive model result Ratio, this index is for weighing the accuracy of model.
(2) coverage rate: coverage rate=sum (prediction is correct)/sum (actual arrearage), the arrearage that descriptive model is excavated Number of users accounts for the ratio of true arrearage amount.
(3) lifting degree: the hit rate of model prediction and the ratio of the hit rate of random screening, is to weigh model validation Reference standard.
Six, model output
6.1 Electricity customers risk class
According to client's arrearage state during model calculation, Electricity customers risk class is divided into potential risk and true risk two Class:
(1) potential risk: the electricity charge of being settled when model calculation, simultaneously according to the Electricity customers of electricity charge risk evaluation model output Assessment result, is divided into potential excessive risk, potential risk and potential low-risk Three Estate.
(2) true risk: not yet close or still pay for first record during model calculation, export according to electricity charge risk evaluation model Risk class, be divided into true excessive risk, true risk and true low-risk Three Estate.
6.2 Electricity customers risk trend
The risk class grade trend of Electricity customers can record each Risk Variation direction investigating period, more fully portrays The situation of paying dues of client.
Seven) electricity charge risk difference processes:
Export result according to model, take tariff recovery strategy and the preventive measure of differentiation in advance, shorten the tariff recovery cycle, Control business risk, for low-risk user, reduce prompting, notice link, reduce and urge expense frequency, preferentially promote e bill; Excessive risk user is carried out collection work, promotes the frequency urging expense.
The electricity charge risk model construction method of a kind of big data shown in figure 1 above is the specific embodiment of the present invention, Through embodying substantive distinguishing features of the present invention and progress, under the enlightenment of the present invention it can be carried out according to actual use needs The equivalent modifications of the aspect such as shape, structure, all at the row of protection domain of this programme.

Claims (5)

1. the electricity charge risk model construction method of big data, it is characterised in that comprise the following steps:
One) data prepare:
1) data acquisition: according to the internal marketing system data of state's net, electricity consumption acquisition system data, collection customer basis information, Paying information, promise breaking information, illegal information, electricity consumption tendency information;According to external system data, collect outside credit information, OK Industry foreground information is evaluated, production and operation information data;
2) data detection: the data obtained are tested, including: A, ID be uniqueness, check each ID variable The most only occur once, if occurring repeatedly, then verify reason and adjust data;B, scope and value: whether check each variable Be that definition is clear, there is the known or field of expection span, when data are continuous variable, its value set pre- In the range of phase, when data are nominal variable, it is the value in dimension table;C, missing values: check whether each field exists disappearance Value and source thereof are the most complete, if there is missing values, then analyze the reason that missing values occurs, and according to reason, enter missing values Row processes;D, exceptional value: the observation of inspection data whether bias data collection, think this number when the observation of bias data collection According to for exceptional value, the reason that inspection exceptional value occurs, and process exceptional value accordingly;
3) data process: data process and include being carried out missing values, outlier, the record of exceptional value with related derivative variable Generation;
Two) index system establishment:
With Electricity customers data as sample;Analyze each dimension data information having arrearage record client, extract many and arrearage wind The variable may being correlated with in danger;Variable includes: produce penalty number of times, average returned money duration, the end of month pay the fees number of times, exceed the time limit and pay dues Rate, whether exist exceed the time limit continuously, the duration that exceeds the time limit, business change number of times, the amount of money that should collect charges for electricity the most steady;
Three) correlation analysis:
To variable, including original variable and derivative variable, carry out correlation analysis, weigh the dependency between variable;Work as phase relation When number is more than setting value, it is believed that two variable height correlations, then delete a wherein variable;
Single argument is analyzed, including the association analysis between explanatory variable and explained variable and chi-square analysis;
Four) model construction:
Produce penalty number of times according to variable, whether average returned money duration, the end of month pay the fees number of times, the rate of paying dues of exceeding the time limit, exist continuously Exceed the time limit, the duration that exceeds the time limit, business change number of times, the amount of money that should collect charges for electricity whether steady, to high pressure client, low pressure is non-occupies client, resident Client carries out the electricity charge Risk Modeling of correspondence, and calculates, according to model, the probability that electricity charge risk occurs;
Five) model output:
1) Electricity customers risk class classification:
The probability that electricity charge risk occurs is calculated, according to client's arrearage state, by Electricity customers risk class according to the model built It is divided into potential risk and true risk two class:
A, potential risk: the electricity charge of being settled when model calculation, simultaneously according to the Electricity customers of electricity charge risk evaluation model output Assessment result, is divided into potential excessive risk, potential risk and potential low-risk Three Estate;
B) true risk: not yet close or still pay for first record during model calculation, according to the wind of electricity charge risk evaluation model output Danger grade, is divided into true excessive risk, true risk and true low-risk Three Estate;
2) Electricity customers risk trend is analyzed:
Record each Risk Variation direction investigating period, with the situation of paying dues of dynamic reflection client all sidedly:
Six) electricity charge risk difference processes:
Export result according to model, take tariff recovery strategy and the preventive measure of differentiation in advance, shorten the tariff recovery cycle, Control business risk, for low-risk user, reduce prompting, notice link, reduce and urge expense frequency, preferentially promote e bill; Excessive risk user is carried out collection work, promotes the frequency urging expense.
The electricity charge risk model construction method of a kind of big data the most according to claim 1, it is characterised in that: in step Five) carry out modelling verification before model output, according to electricity charge risk evaluation model, Electricity customers is predicted assessment, and by result Carry out contrast verification with actual arrearage result, including analyzing hit rate, coverage rate and the variation tendency of lifting degree, and model is done Corresponding tuning processes;Wherein, hit rate: hit rate=predict correct sum/forecasting risk client's number, in descriptive model result The ratio of correct result, this index is for weighing the accuracy of model;Coverage rate: the correct sum/actual arrearage of coverage rate=prediction Sum, defaulting subscriber's quantity that descriptive model is excavated accounts for the ratio of true arrearage amount;Lifting degree: the hit of model prediction The ratio of the hit rate of rate and random screening, is the reference standard weighing model validation.
The electricity charge risk model construction method of a kind of big data the most according to claim 1, it is characterised in that: outlier is Numerical value beyond corresponding positive and negative 3 standard deviations of average of variable, exceptional value is the observation of bias data collection, outlier, exception The processing method of value includes: outlier, exceptional value are adjusted to closest normal value;Directly reject outlier or exception Value;Outlier or exceptional value is substituted by null value NULL;
When data are exceptional value, the reason that inspection exceptional value occurs, and be correspondingly processed;If outlier or exceptional value without Business implication, the most directly rejects outlier or exceptional value or substitutes outlier or exceptional value by null value NULL.
The electricity charge risk model construction method of a kind of big data the most according to claim 1, it is characterised in that: missing values Processing method includes: missing values is adjusted to fixed value;Missing values is adjusted to the random value of a Normal Distribution.
The electricity charge risk model construction method of a kind of big data the most according to claim 1, it is characterised in that: in step Four) in model construction:
For high pressure customer modeling, the electricity charge Risk Calculation equation determined is:
Occupying customer modeling for low pressure is non-, the electricity charge Risk Calculation equation determined is:
Modeling for residential customers, the electricity charge Risk Calculation equation determined is:
If the probability that y occurs is p, then the probability that electricity charge risk occurs is:
Wherein:For long pointer during returned money,For produce penalty number of times index,For number of times index of paying dues the end of month,For exceeding Phase pay dues rate index,For index of exceeding the time limit the most continuously,Long pointer during for exceeding the time limit,For business change number of times index,For Should be collected charges for electricity the most steadily index;Wherein, when several variablees are forward to the combined influence power of arrears risk, due to magnitude Difference, when it simultaneously enters model, the coefficient of Partial Variable becomes negative value;P is the general of Electricity customers generation arrears risk Rate.
CN201610587762.5A 2016-07-25 2016-07-25 A kind of electricity charge risk model construction method of big data Pending CN106251049A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610587762.5A CN106251049A (en) 2016-07-25 2016-07-25 A kind of electricity charge risk model construction method of big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610587762.5A CN106251049A (en) 2016-07-25 2016-07-25 A kind of electricity charge risk model construction method of big data

Publications (1)

Publication Number Publication Date
CN106251049A true CN106251049A (en) 2016-12-21

Family

ID=57603387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610587762.5A Pending CN106251049A (en) 2016-07-25 2016-07-25 A kind of electricity charge risk model construction method of big data

Country Status (1)

Country Link
CN (1) CN106251049A (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657544A (en) * 2017-09-14 2018-02-02 国网辽宁省电力有限公司 A kind of automatic paying method of the improved electricity charge and system
CN107895245A (en) * 2017-12-26 2018-04-10 国网宁夏电力有限公司银川供电公司 A kind of tariff recovery methods of risk assessment based on user's portrait
CN108022179A (en) * 2017-11-20 2018-05-11 国网福建省电力有限公司 A kind of doubtful stealing Subject elements based on Chi-square Test determine method
CN108961095A (en) * 2018-06-13 2018-12-07 国网福建省电力有限公司 A kind of intelligent collection electricity charge system based on AI
CN108956885A (en) * 2018-07-21 2018-12-07 翼捷安全设备(昆山)有限公司 Gas detection intelligence Warning System based on sensor network
CN109002549A (en) * 2018-07-31 2018-12-14 国政通科技有限公司 A kind of method and device for precisely hitting high-end tourism potential user
CN109063984A (en) * 2018-07-18 2018-12-21 平安科技(深圳)有限公司 Risk passenger method, apparatus, computer equipment and storage medium
CN109255555A (en) * 2018-10-16 2019-01-22 中国电力科学研究院有限公司 Electric power big data life period of equipment estimation method based on historical operational information
CN109685526A (en) * 2018-12-12 2019-04-26 税友软件集团股份有限公司 A kind of method for evaluating credit rating of enterprise, device and relevant device
CN109858749A (en) * 2018-12-26 2019-06-07 广东电网有限责任公司 It is a kind of that charging method and system are urged based on client's reference
CN110210686A (en) * 2019-06-13 2019-09-06 郑州轻工业学院 A kind of electricity charge risk model construction method of electric power big data
CN110782140A (en) * 2019-10-11 2020-02-11 国网江苏省电力有限公司电力科学研究院 Multi-dimensional element evaluation method for electric charge recovery risk screening
CN111126776A (en) * 2019-11-26 2020-05-08 国网浙江省电力有限公司 Electricity charge risk prevention and control model construction method based on logistic regression algorithm
CN111198907A (en) * 2019-12-24 2020-05-26 深圳供电局有限公司 Method and device for identifying potential defaulting user, computer equipment and storage medium
CN111222239A (en) * 2020-01-04 2020-06-02 华北理工大学 Blast furnace ironmaking data standardization processing method and system
CN111340375A (en) * 2020-02-28 2020-06-26 创新奇智(上海)科技有限公司 Electricity charge recycling risk prediction method and device, electronic equipment and storage medium
CN111461574A (en) * 2020-04-24 2020-07-28 国网吉林省电力有限公司 User electricity charge clearing risk discovery method based on regional geographical position information
CN111639882A (en) * 2020-06-15 2020-09-08 江苏电力信息技术有限公司 Deep learning-based power utilization risk judgment method
CN111861703A (en) * 2020-07-10 2020-10-30 深圳无域科技技术有限公司 Data-driven wind control strategy rule generation method and system and risk control method and system
CN111968268A (en) * 2020-06-29 2020-11-20 南斗六星系统集成有限公司 New energy vehicle health condition remote evaluation method and system
CN112633663A (en) * 2020-12-17 2021-04-09 南方电网海南数字电网研究院有限公司 Electricity charge meter reading accounting analysis system based on big data platform
CN113256008A (en) * 2021-05-31 2021-08-13 国家电网有限公司大数据中心 Arrearage risk level determination method, device, equipment and storage medium
CN113255137A (en) * 2021-05-31 2021-08-13 中铁第一勘察设计院集团有限公司 Target object strain data processing method and device and storage medium
CN113642825A (en) * 2021-05-28 2021-11-12 浙江惠瀜网络科技有限公司 Supervision method suitable for vehicle loan cooperation mechanism
CN114165777A (en) * 2020-09-10 2022-03-11 河北云酷科技有限公司 Intelligent identification model for four-pipe leakage of power plant boiler
CN115662464A (en) * 2022-12-29 2023-01-31 广州市云景信息科技有限公司 Method and system for intelligently identifying environmental noise
CN115730748A (en) * 2022-12-30 2023-03-03 广西电网有限责任公司 KNN algorithm-based power customer behavior prediction method and system

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657544A (en) * 2017-09-14 2018-02-02 国网辽宁省电力有限公司 A kind of automatic paying method of the improved electricity charge and system
CN108022179B (en) * 2017-11-20 2024-03-26 国网福建省电力有限公司 Suspected electricity larceny subject factor determination method based on chi-square test
CN108022179A (en) * 2017-11-20 2018-05-11 国网福建省电力有限公司 A kind of doubtful stealing Subject elements based on Chi-square Test determine method
CN107895245A (en) * 2017-12-26 2018-04-10 国网宁夏电力有限公司银川供电公司 A kind of tariff recovery methods of risk assessment based on user's portrait
CN108961095A (en) * 2018-06-13 2018-12-07 国网福建省电力有限公司 A kind of intelligent collection electricity charge system based on AI
CN109063984A (en) * 2018-07-18 2018-12-21 平安科技(深圳)有限公司 Risk passenger method, apparatus, computer equipment and storage medium
CN109063984B (en) * 2018-07-18 2023-09-05 平安科技(深圳)有限公司 Method, apparatus, computer device and storage medium for risky travelers
CN108956885B (en) * 2018-07-21 2020-06-16 翼捷安全设备(昆山)有限公司 Gas detection intelligent risk early warning system based on sensor network
CN108956885A (en) * 2018-07-21 2018-12-07 翼捷安全设备(昆山)有限公司 Gas detection intelligence Warning System based on sensor network
CN109002549A (en) * 2018-07-31 2018-12-14 国政通科技有限公司 A kind of method and device for precisely hitting high-end tourism potential user
CN109255555A (en) * 2018-10-16 2019-01-22 中国电力科学研究院有限公司 Electric power big data life period of equipment estimation method based on historical operational information
CN109255555B (en) * 2018-10-16 2023-10-27 中国电力科学研究院有限公司 Electric power big data equipment life cycle estimation method based on historical operation information
CN109685526A (en) * 2018-12-12 2019-04-26 税友软件集团股份有限公司 A kind of method for evaluating credit rating of enterprise, device and relevant device
CN109858749A (en) * 2018-12-26 2019-06-07 广东电网有限责任公司 It is a kind of that charging method and system are urged based on client's reference
CN110210686A (en) * 2019-06-13 2019-09-06 郑州轻工业学院 A kind of electricity charge risk model construction method of electric power big data
CN110782140B (en) * 2019-10-11 2022-08-12 国网江苏省电力有限公司电力科学研究院 Multi-dimensional element evaluation method for electric charge recovery risk screening
CN110782140A (en) * 2019-10-11 2020-02-11 国网江苏省电力有限公司电力科学研究院 Multi-dimensional element evaluation method for electric charge recovery risk screening
CN111126776A (en) * 2019-11-26 2020-05-08 国网浙江省电力有限公司 Electricity charge risk prevention and control model construction method based on logistic regression algorithm
CN111198907A (en) * 2019-12-24 2020-05-26 深圳供电局有限公司 Method and device for identifying potential defaulting user, computer equipment and storage medium
CN111222239A (en) * 2020-01-04 2020-06-02 华北理工大学 Blast furnace ironmaking data standardization processing method and system
CN111340375A (en) * 2020-02-28 2020-06-26 创新奇智(上海)科技有限公司 Electricity charge recycling risk prediction method and device, electronic equipment and storage medium
CN111461574B (en) * 2020-04-24 2022-03-29 国网吉林省电力有限公司 User electricity charge clearing risk discovery method based on regional geographical position information
CN111461574A (en) * 2020-04-24 2020-07-28 国网吉林省电力有限公司 User electricity charge clearing risk discovery method based on regional geographical position information
CN111639882A (en) * 2020-06-15 2020-09-08 江苏电力信息技术有限公司 Deep learning-based power utilization risk judgment method
CN111639882B (en) * 2020-06-15 2023-05-19 江苏电力信息技术有限公司 Deep learning-based electricity risk judging method
CN111968268A (en) * 2020-06-29 2020-11-20 南斗六星系统集成有限公司 New energy vehicle health condition remote evaluation method and system
CN111861703A (en) * 2020-07-10 2020-10-30 深圳无域科技技术有限公司 Data-driven wind control strategy rule generation method and system and risk control method and system
CN114165777A (en) * 2020-09-10 2022-03-11 河北云酷科技有限公司 Intelligent identification model for four-pipe leakage of power plant boiler
CN114165777B (en) * 2020-09-10 2023-10-24 河北云酷科技有限公司 Intelligent recognition model for four-pipe leakage of power plant boiler
CN112633663A (en) * 2020-12-17 2021-04-09 南方电网海南数字电网研究院有限公司 Electricity charge meter reading accounting analysis system based on big data platform
CN113642825A (en) * 2021-05-28 2021-11-12 浙江惠瀜网络科技有限公司 Supervision method suitable for vehicle loan cooperation mechanism
CN113255137A (en) * 2021-05-31 2021-08-13 中铁第一勘察设计院集团有限公司 Target object strain data processing method and device and storage medium
CN113256008A (en) * 2021-05-31 2021-08-13 国家电网有限公司大数据中心 Arrearage risk level determination method, device, equipment and storage medium
CN115662464A (en) * 2022-12-29 2023-01-31 广州市云景信息科技有限公司 Method and system for intelligently identifying environmental noise
CN115730748A (en) * 2022-12-30 2023-03-03 广西电网有限责任公司 KNN algorithm-based power customer behavior prediction method and system
CN115730748B (en) * 2022-12-30 2023-06-23 广西电网有限责任公司 KNN algorithm-based power customer behavior prediction method and system

Similar Documents

Publication Publication Date Title
CN106251049A (en) A kind of electricity charge risk model construction method of big data
CN110097297B (en) Multi-dimensional electricity stealing situation intelligent sensing method, system, equipment and medium
León et al. Variability and trend-based generalized rule induction model to NTL detection in power companies
CN106780140A (en) Electric power credit assessment method based on big data
CN107862347A (en) A kind of discovery method of the electricity stealing based on random forest
CN106339942A (en) Financial information processing method and system
CN107145966A (en) Logic-based returns the analysis and early warning method of opposing electricity-stealing of probability analysis Optimized model
CN110458230A (en) A kind of distribution transforming based on the fusion of more criterions is with adopting data exception discriminating method
CN106067088A (en) E-bank accesses detection method and the device of behavior
CN110222991B (en) Metering device fault diagnosis method based on RF-GBDT
CN110119948B (en) Power consumer credit evaluation method and system based on time-varying weight dynamic combination
CN111178672B (en) Intelligent inspection method based on balance
CN109903182A (en) Power customer arrears risk analysis method and device based on random forests algorithm
CN102081781A (en) Finance modeling optimization method based on information self-circulation
CN105867341A (en) Online equipment health state self-detection method and system for tobacco processing equipment
Liu et al. Application of hierarchical clustering in tax inspection case-selecting
CN113450004A (en) Power credit report generation method and device, electronic equipment and readable storage medium
CN105550809A (en) Credit reporting system for assessment of enterprise credit
CN109102396A (en) A kind of user credit ranking method, computer equipment and readable medium
CN115905319B (en) Automatic identification method and system for abnormal electricity fees of massive users
CN107194529B (en) Power distribution network reliability economic benefit analysis method and device based on mining technology
CN108493933A (en) A kind of Characteristics of Electric Load method for digging based on depth decision Tree algorithms
CN104268804A (en) High-quality electric power customer data mining method based on hierarchical data envelopment analysis
KR102336462B1 (en) Apparatus and method of credit rating
Xie et al. The engineering of China commercial bank operational risk measurement

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20161221

RJ01 Rejection of invention patent application after publication