CN106251049A - A kind of electricity charge risk model construction method of big data - Google Patents
A kind of electricity charge risk model construction method of big data Download PDFInfo
- Publication number
- CN106251049A CN106251049A CN201610587762.5A CN201610587762A CN106251049A CN 106251049 A CN106251049 A CN 106251049A CN 201610587762 A CN201610587762 A CN 201610587762A CN 106251049 A CN106251049 A CN 106251049A
- Authority
- CN
- China
- Prior art keywords
- risk
- data
- variable
- value
- electricity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000005611 electricity Effects 0.000 title claims abstract description 112
- 238000010276 construction Methods 0.000 title claims abstract description 20
- 238000000034 method Methods 0.000 claims abstract description 30
- 230000008569 process Effects 0.000 claims abstract description 20
- 238000010219 correlation analysis Methods 0.000 claims abstract description 10
- 230000008859 change Effects 0.000 claims description 14
- 238000013210 evaluation model Methods 0.000 claims description 11
- 238000011084 recovery Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000013480 data collection Methods 0.000 claims description 7
- 238000012821 model calculation Methods 0.000 claims description 7
- 238000007689 inspection Methods 0.000 claims description 6
- 238000004519 manufacturing process Methods 0.000 claims description 6
- 238000003672 processing method Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 230000000875 corresponding effect Effects 0.000 claims description 5
- 230000004069 differentiation Effects 0.000 claims description 5
- 238000005303 weighing Methods 0.000 claims description 5
- 238000012098 association analyses Methods 0.000 claims description 4
- 230000002596 correlated effect Effects 0.000 claims description 4
- 238000000546 chi-square test Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 238000009826 distribution Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 230000003449 preventive effect Effects 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 238000010200 validation analysis Methods 0.000 claims description 3
- 230000008034 disappearance Effects 0.000 claims description 2
- 238000013459 approach Methods 0.000 abstract description 2
- 238000011176 pooling Methods 0.000 abstract description 2
- 238000003825 pressing Methods 0.000 abstract description 2
- 238000004458 analytical method Methods 0.000 description 13
- 238000011156 evaluation Methods 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 238000012502 risk assessment Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 2
- 230000006698 induction Effects 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000000611 regression analysis Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Health & Medical Sciences (AREA)
- Marketing (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- Public Health (AREA)
- Primary Health Care (AREA)
- Water Supply & Treatment (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- General Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The electricity charge risk model construction method of a kind of big data, relates to a kind of risk management and control method.At present, pressing for payment of of the electricity charge is to carry out by unified frequency and unified approach;On the one hand there is number of times, the too much problem of frequency pressed for payment of, on the other hand there is the problem that payment can not complete target.The present invention comprises the following steps: one) data prepare, two) index system establishment, three) correlation analysis, four) model construction, five) model output, six) electricity charge risk difference process.The technical program focuses on excessive risk user, promotes the specific aim urging expense.For low-risk user, can suitably reduce prompting, notice link, reduce and urge expense frequency, preferentially promote e bill;Emphasis carries out collection work for excessive risk user, promotes the frequency urging expense;Effectively optimize and pooling of resources, improve the success rate of collection.
Description
Technical field
The present invention relates to a kind of electricity charge risk model construction method, particularly relate to the electricity charge risk model of a kind of big data
Construction method.
Background technology
At present, pressing for payment of of the electricity charge is to carry out by unified frequency and unified approach;The client that the electricity charge are paid the fees on time and existence electricity
Expense pays the client of risk, presses for payment of mode, frequency identical, on the one hand there is number of times, the too much problem of frequency pressed for payment of, on the other hand deposit
The problem that can not complete target in payment, wastes resource, increases risk.
Summary of the invention
The technical assignment of the technical problem to be solved in the present invention and proposition is to carry out prior art improving and improving,
There is provided the electricity charge risk model construction method of a kind of big data, to reduce workload and the purpose reduced risks.To this end, this
Techniques below scheme is taked in invention.
The electricity charge risk model construction method of a kind of big data comprises the following steps:
One) data prepare:
1) data acquisition: according to the internal marketing system data of state's net, electricity consumption acquisition system data, collection customer basis information,
Paying information, promise breaking information, illegal information, electricity consumption tendency information;According to external system data, collect outside credit information, OK
Industry foreground information is evaluated, production and operation information data;
2) data detection: the data obtained are tested, including: A, ID be uniqueness, check each ID variable
The most only occur once, if occurring repeatedly, then verify reason and adjust data;B, scope and value: whether check each variable
Be that definition is clear, there is the known or field of expection span, when data are continuous variable, its value set pre-
In the range of phase, when data are nominal variable, it is the value in dimension table;C, missing values: check whether each field exists disappearance
Value and source thereof are the most complete, if there is missing values, then analyze the reason that missing values occurs, and according to reason, enter missing values
Row processes;D, exceptional value: the observation of inspection data whether bias data collection, think this number when the observation of bias data collection
According to for exceptional value, the reason that inspection exceptional value occurs, and process exceptional value accordingly;
3) data process: data process and include being carried out missing values, outlier, the record of exceptional value with related derivative variable
Generation;
Two) index system establishment:
With Electricity customers data as sample;Analyze each dimension data information having arrearage record client, extract many and arrearage wind
The variable may being correlated with in danger;Variable includes: produce penalty number of times, average returned money duration, the end of month pay the fees number of times, exceed the time limit and pay dues
Rate, whether exist exceed the time limit continuously, the duration that exceeds the time limit, business change number of times, the amount of money that should collect charges for electricity the most steady;
Three) correlation analysis:
To variable, including original variable and derivative variable, carry out correlation analysis, weigh the dependency between variable;Work as phase relation
When number is more than setting value, it is believed that two variable height correlations, then delete a wherein variable;
Single argument is analyzed, including the association analysis between explanatory variable and explained variable and chi-square analysis;
Four) model construction:
Produce penalty number of times according to variable, whether average returned money duration, the end of month pay the fees number of times, the rate of paying dues of exceeding the time limit, exist continuously
Exceed the time limit, the duration that exceeds the time limit, business change number of times, the amount of money that should collect charges for electricity whether steady, to high pressure client, low pressure is non-occupies client, resident
Client carries out the electricity charge Risk Modeling of correspondence, and calculates, according to model, the probability that electricity charge risk occurs;
Five) model output:
1) Electricity customers risk class classification:
The probability that electricity charge risk occurs is calculated, according to client's arrearage state, by Electricity customers risk class according to the model built
It is divided into potential risk and true risk two class:
A, potential risk: the electricity charge of being settled when model calculation, simultaneously according to the Electricity customers of electricity charge risk evaluation model output
Assessment result, is divided into potential excessive risk, potential risk and potential low-risk Three Estate;
B) true risk: not yet close or still pay for first record during model calculation, according to the wind of electricity charge risk evaluation model output
Danger grade, is divided into true excessive risk, true risk and true low-risk Three Estate;
2) Electricity customers risk trend is analyzed:
Record each Risk Variation direction investigating period, with the situation of paying dues of dynamic reflection client all sidedly:
Six) electricity charge risk difference processes:
Export result according to model, take tariff recovery strategy and the preventive measure of differentiation in advance, shorten the tariff recovery cycle,
Control business risk, for low-risk user, reduce prompting, notice link, reduce and urge expense frequency, preferentially promote e bill;
Excessive risk user is carried out collection work, promotes the frequency urging expense.
Electricity charge risk evaluation model is based on data such as customer electricity direct action and correlation behaviors, utilizes business personnel to adjust
Grind, basic unit's interview, the method such as rule induction, carry out risk analysis of paying dues, Default Risk, illegal risk analysis, electricity consumption become
The work such as potential analysis, outside credit information evaluation, the evaluation of industry foreground information, production and operation information evaluation.Utilize " a storehouse simultaneously
Data process, fundamental analysis and the advanced analysis module at three " center " mathematical analysis centers, uses data digging method, exports thing
Real risk subscribers and potential risk user, give " excessive risk ", " risk ", " low-risk " three grades of electricity charge risk identification marks respectively
Signing and risk trend label, support is carried out differentiation and is not paid the work of collection Study on Measures, ultimately forms the collection strategy of diversification.
Focus on excessive risk user, promote the specific aim urging expense.For low-risk user, can suitably reduce prompting, notice
Link, reduces and urges expense frequency, preferentially promote e bill;Emphasis carries out collection work for excessive risk user, promotes and urges expense
Frequency, such as reminding short message were once brought up to 1 day once by 2 days.Effectively optimize and pooling of resources, improve the success of collection
Rate.
As improving further and supplementing technique scheme, present invention additionally comprises following additional technical feature.
In step 5) carry out modelling verification before model output, according to electricity charge risk evaluation model, Electricity customers is carried out pre-
Test and appraisal are estimated, and result and actual arrearage result are carried out contrast verification, including analyzing hit rate, coverage rate and the change of lifting degree
Trend, and model is done the process of corresponding tuning;Wherein, hit rate: hit rate=predict correct sum/forecasting risk client's number,
The ratio of correct result in descriptive model result, this index is for weighing the accuracy of model;Coverage rate: coverage rate=prediction is just
Really sum/actual arrearage sum, defaulting subscriber's quantity that descriptive model is excavated accounts for the ratio of true arrearage amount;Promote
Degree: the hit rate of model prediction and the ratio of the hit rate of random screening, is the reference standard weighing model validation.
Outlier is the numerical value beyond corresponding positive and negative 3 standard deviations of average of variable, and exceptional value is the sight of bias data collection
Measured value, outlier, the processing method of exceptional value include: outlier, exceptional value are adjusted to closest normal value;Directly pick
Except outlier or exceptional value;Outlier or exceptional value is substituted by null value NULL;
When data are exceptional value, the reason that inspection exceptional value occurs, and be correspondingly processed;If outlier or exceptional value without
Business implication, the most directly rejects outlier or exceptional value or substitutes outlier or exceptional value by null value NULL.
The processing method of missing values includes: missing values is adjusted to fixed value;Missing values is adjusted to one and obeys normal state
The random value of distribution.
In step 4) in model construction:
For high pressure customer modeling, the electricity charge Risk Calculation equation determined is:
Occupying customer modeling for low pressure is non-, the electricity charge Risk Calculation equation determined is:
Modeling for residential customers, the electricity charge Risk Calculation equation determined is:
If the probability that y occurs is p, then the probability that electricity charge risk occurs is:
Wherein:For long pointer during returned money,For produce penalty number of times index,For number of times index of paying dues the end of month,For exceeding
Phase pay dues rate index,For index of exceeding the time limit the most continuously,Long pointer during for exceeding the time limit,For business change number of times index,For
Should be collected charges for electricity the most steadily index;Wherein, when several variablees are forward to the combined influence power of arrears risk, due to magnitude
Difference, when it simultaneously enters model, the coefficient of Partial Variable becomes negative value;P is the general of Electricity customers generation arrears risk
Rate.
Beneficial effect: the technical program constructs respectively for high voltage customer, the non-resident user of low pressure, low pressure resident
Model.Reduce tariff recovery risk, lifting tariff recovery rate provides data supporting, shortens the tariff recovery cycle, reduces and manage
Risk.
Accompanying drawing explanation
Fig. 1 is flow chart of the present invention.
Fig. 2 is the graph of a relation of the arrearage number of households involved and returned money duration.
Detailed description of the invention
Below in conjunction with Figure of description, technical scheme is described in further detail.
Electricity charge risk evaluation model is based on data such as customer electricity direct action and correlation behaviors, utilizes business personnel to adjust
Grind, basic unit's interview, the method such as rule induction, carry out risk analysis of paying dues, Default Risk, illegal risk analysis, electricity consumption become
The work such as potential analysis, outside credit information evaluation, the evaluation of industry foreground information, production and operation information evaluation.Utilize " a storehouse simultaneously
Data process, fundamental analysis and the advanced analysis module at three " center " mathematical analysis centers, uses data digging method, exports thing
Real risk subscribers and potential risk user, give " excessive risk ", " risk ", " low-risk " three grades of electricity charge risk identification marks respectively
Signing and risk trend label, support is carried out differentiation and is not paid the work of collection Study on Measures, ultimately forms the collection strategy of diversification.
As it is shown in figure 1, the present invention comprises the following steps:
One, data prepare
2.2.1 data acquisition
Utilize state net existing marketing system data and use extraction system data, collect customer basis information, paying information, promise breaking respectively
Information, illegal information, electricity consumption tendency information.Utilize business personnel to investigate, basic unit's interview and other external system data, outside collection
Portion's credit information, industry foreground information are evaluated, production and operation information data.Specifically include: (1) base attribute: Customs Assigned Number, family
Name, user's classification, trade classification, capacity etc.;(2) pay dues behavior: electricity charge issue date, paid date, should collect charges for electricity, paid electricity
Take, way to pay dues etc.;(3) electricity consumption behavior: promise breaking electricity consumption historical record, illegal electricity consumption historical record, power consumption historical record, letter
With evaluation history record etc.;(4) related information: outside credit information, industry foreground information are evaluated, production and operation information data.
2.2.2 data detection
After obtaining data, the first reply quality of data is tested, including: the uniqueness of (1) ID: modeling basic data collection
In, each user is observation data (observation), and once, otherwise needing should only occur in the most each ID variable
Reason to be verified, adjusts data;(2) scope and value: each variable that modeling data is concentrated use in should be fixed from
Justice is clear, it is known to have or the field of expection span.The value of continuous variable should be in certain desired extent, and name
Justice variable should take the value in dimension table;(3) missing values: the fact that missing values is the most indisputable, therefore, identifies modeling number
Missing values and source thereof according to each field of concentration are the basic steps in integrity check.The generation of missing values is probably error
Result, it is also possible to be because the value that is not defined of field formulated.(4) exceptional value: exceptional value refers to deviate considerably from data
The observation of collection, such as data such as excessive, too small, negative values.Exceptional value is likely due to what misregistration caused, it is also possible to be really
Data.The reason that exceptional value to be checked occurs, and process exceptional value accordingly.
2.2.3 data process
Data process mainly comprise missing values, outlier, the record of exceptional value are carried out with the generation of related derivative variable.
(1) processing method of outlier, exceptional value:
Outlier, exceptional value are adjusted to closest normal value by l.Such as, if outlier is defined as 3 standard deviations
In addition, then the maximum or the minima that can use 3 standard deviations are replaced.
L directly rejects outlier or exceptional value.
L null value NULL substitutes outlier or exceptional value.
(2) processing method of missing values:
Missing values is adjusted to certain fixed value by l.Such as average, intermediate value or a constant specified.
Missing values is adjusted to the random value of a Normal Distribution by l.
(3) generation of related derivative variable:
● based on " the mathematical analysis center of storehouse three " center " utilizes the variable computing function of data processing module to generate arrearage
The each derivative variable that risk theme is relevant.
Two, index system establishment
Choosing Ninghai County of Ningbo City Electricity customers data is each dimension data that sample, first selective analysis have arrearage record client
Information, and based on " the fundamental analysis module at a " center " mathematical analysis center, storehouse three carries out correlation analysis, extracts many and arrearage
The variable that risk may be correlated with.Through correlation analysis, the variable higher with arrears risk degree of correlation is: generation penalty number of times,
Pay the fees at average returned money duration, the end of month number of times, the rate of paying dues of exceeding the time limit, whether exist exceed the time limit continuously, the duration that exceeds the time limit, business change number of times,
The amount of money that should collect charges for electricity is the most steady.The specific explanations of these variablees is as follows.
(1) penalty number of times is produced.Electricity customers is paid dues the most in time in probation endogenous cause of ill and is produced penalty
Number of times, the number of times arrears risk the most at most that Electricity customers produces penalty is the biggest;(2) average returned money duration.The i.e. electricity charge are real
Receiving the time difference between date and issuing date, the shorter subscriber arrearage risk of returned money duration is less, whereas larger;(3) the end of month
Pay dues number of times.The number of times that Electricity customers is paid dues after No. 25 in probation, when the arrearage situation of the user that pays dues the end of month is compared with other
Between the user that pays dues of section serious, therefore number of times arrears risk the most at most of paying dues the end of month of Electricity customers is the biggest;(4) exceed the time limit the rate of paying dues.
Exceeding the time limit, the rate of paying dues=exceed the time limit number of times/total degree of paying dues of paying dues, the rate of paying dues of exceeding the time limit and arrears risk probability size correlation;
(5) whether exist and exceed the time limit continuously.Within the most continuous three months, exist to exceed the time limit at probation Electricity customers and pay dues, find nearly three through analyzing
The electricity consumption user that individual month exceeds the time limit continuously and the Electricity customers arrearage ratio exceeded the time limit for discontinuous three months have notable difference;(6) exceed the time limit
Duration.Date during (or penalty date of start of calculation) and the electricity charge paid day of i.e. paying dues the deadline is poor, and the duration that exceeds the time limit is the shortest, electricity consumption
Risk class is the lowest for the fact that client;(7) business change number of times.Business change refers to handle transfer, new clothes, change class, time-out
Etc. business, handling the client of change business, its arrears risk probability is the most higher;(7) amount of money that should collect charges for electricity the most steadily is led
If investigating the amount of money variation tendency that should collect charges for electricity of client, the amount of money that should collect charges for electricity the most then risk class is low, and should collect charges for electricity the amount of money
Unstable then electricity charge risk class is the highest.
Three, correlation analysis
Variable to " waiting to model ", including original variable and derivative variable, carries out correlation analysis, weighs being correlated with between variable
Property.It is said that in general, correlation coefficient > 0.8 time, two variable height correlations, need to delete one of them, through correlation analysis and
Micro-judgment, can delete a part of variable.
Before being modeled analyzing, it is generally required to single argument is analyzed, mainly include between explained variable
Association analysis and chi-square analysis.To determine whether a certain variable can be used for modeling, and the need of carrying out this variable turning
Change.
As in figure 2 it is shown, in January, 2015 arrearage in June ,-2015 number of households involved and the relation of returned money duration:
Finding through analyzing: the packet that average returned money duration is the biggest, its arrearage ratio is the highest, and arrearage the most once occurs, and sends out after it
The probability of raw arrearage is bigger.
Value of information IV is equally used to weigh the relatedness between explanatory variable and explained variable, association analysis
Result IV value > 0.3 time, between interpretation variable and explained variable, there is High relevancy.Produce penalty number of times and target
Value of information IV between variable is as follows:
The electricity charge risk model construction method of a kind of big data of variable | The IV value of each variable of high pressure client model | The non-IV value occupying each variable of client model | The IV value of each variable of residential customers model |
Produce penalty number of times | 0.5618 | 0.5304 | 0.5710 |
Produce value of information IV between penalty number of times index and target variable (whether being arrearage client) as can be seen from the above table equal
More than 0.3, illustrate to have between the two High relevancy, i.e. produce penalty number of times index and can include model in.
Four, model construction
Arrears risk research is mainly based upon the information datas such as client's essential information, paying information, power information, utilizes classification to calculate
To client, whether arrearage is predicted method.The most common sorting algorithm has: logistic, decision tree, neutral net, KNN,
SVM, naive Bayesian 6 class.Due to power customer substantial amounts, the data volume such as client's essential information, paying information is very big, because of
This Risk Modeling pays the utmost attention to simple and parallel processing speeds algorithm faster.Simultaneously taking account of will be by arrears risk division etc.
Level, therefore Risk Modeling pays the utmost attention to export what result was prone to explain, and the algorithm of easy divided rank.For this based on above-mentioned two
Put and combine conventional historical experience, using logistic algorithm to the arrears risk research segmented market is preferential.Wherein
Logistic simulated target is to predict the probability that client is arrearage this event of client, arrearage probability is converted into risk simultaneously and comments
Point, the biggest then risk score of probability is the highest, and arrears risk the highest grade;The least then risk score of probability is the lowest, arrears risk etc.
Level is the lowest.This algorithm can pass through " in regression analysis two in the advanced analysis module at the mathematical analysis center of storehouse three " center "
Value logistic regression function key realizes.
From the beginning of three variablees, use method of gradual regression, calculate the C value of 3 regression equations arriving N number of variable respectively, select C
Value (C value refers to the area of below ROC curve, it is however generally that, think that regression equation is effective when C value is more than 0.75.C value is the biggest, returns
Equation is the most reliable) maximum regression equation is as Optimality equations.After determining regression variable, substitute into logistic regression equation, draw each
The coefficient of variable.The equation finally determined is:
For high pressure customer modeling, the equation finally determined is:
Occupying customer modeling for low pressure is non-, the equation finally determined is:
Modeling for residential customers, the equation finally determined is:
If the probability that y occurs is p, then:
For long pointer during returned money,For produce penalty number of times index,For number of times index of paying dues the end of month,For the friendship that exceeds the time limit
Rate index,For index of exceeding the time limit the most continuously,Long pointer during for exceeding the time limit,For business change number of times index,For receivable
The electricity charge the most steadily index.Wherein, when several variablees are forward to the combined influence power of arrears risk, due to the difference of magnitude
Different, when it simultaneously enters model, the coefficient of Partial Variable becomes negative value.P is the probability of Electricity customers generation arrears risk.
Five, model verification method
By " two-valued function in a storehouse three " center " mathematical analysis center regression analysis returns function key, it is achieved to client's arrearage
Analysis modeling.According to electricity charge risk evaluation model, Electricity customers is predicted assessment, and result is entered with actual arrearage result
Row contrast verification, Main Analysis hit rate, coverage rate and the variation tendency of lifting degree, and model is done the process of corresponding tuning.
(1) hit rate: hit rate=sum (prediction is correct)/forecasting risk client's number, correct result in descriptive model result
Ratio, this index is for weighing the accuracy of model.
(2) coverage rate: coverage rate=sum (prediction is correct)/sum (actual arrearage), the arrearage that descriptive model is excavated
Number of users accounts for the ratio of true arrearage amount.
(3) lifting degree: the hit rate of model prediction and the ratio of the hit rate of random screening, is to weigh model validation
Reference standard.
Six, model output
6.1 Electricity customers risk class
According to client's arrearage state during model calculation, Electricity customers risk class is divided into potential risk and true risk two
Class:
(1) potential risk: the electricity charge of being settled when model calculation, simultaneously according to the Electricity customers of electricity charge risk evaluation model output
Assessment result, is divided into potential excessive risk, potential risk and potential low-risk Three Estate.
(2) true risk: not yet close or still pay for first record during model calculation, export according to electricity charge risk evaluation model
Risk class, be divided into true excessive risk, true risk and true low-risk Three Estate.
6.2 Electricity customers risk trend
The risk class grade trend of Electricity customers can record each Risk Variation direction investigating period, more fully portrays
The situation of paying dues of client.
Seven) electricity charge risk difference processes:
Export result according to model, take tariff recovery strategy and the preventive measure of differentiation in advance, shorten the tariff recovery cycle,
Control business risk, for low-risk user, reduce prompting, notice link, reduce and urge expense frequency, preferentially promote e bill;
Excessive risk user is carried out collection work, promotes the frequency urging expense.
The electricity charge risk model construction method of a kind of big data shown in figure 1 above is the specific embodiment of the present invention,
Through embodying substantive distinguishing features of the present invention and progress, under the enlightenment of the present invention it can be carried out according to actual use needs
The equivalent modifications of the aspect such as shape, structure, all at the row of protection domain of this programme.
Claims (5)
1. the electricity charge risk model construction method of big data, it is characterised in that comprise the following steps:
One) data prepare:
1) data acquisition: according to the internal marketing system data of state's net, electricity consumption acquisition system data, collection customer basis information,
Paying information, promise breaking information, illegal information, electricity consumption tendency information;According to external system data, collect outside credit information, OK
Industry foreground information is evaluated, production and operation information data;
2) data detection: the data obtained are tested, including: A, ID be uniqueness, check each ID variable
The most only occur once, if occurring repeatedly, then verify reason and adjust data;B, scope and value: whether check each variable
Be that definition is clear, there is the known or field of expection span, when data are continuous variable, its value set pre-
In the range of phase, when data are nominal variable, it is the value in dimension table;C, missing values: check whether each field exists disappearance
Value and source thereof are the most complete, if there is missing values, then analyze the reason that missing values occurs, and according to reason, enter missing values
Row processes;D, exceptional value: the observation of inspection data whether bias data collection, think this number when the observation of bias data collection
According to for exceptional value, the reason that inspection exceptional value occurs, and process exceptional value accordingly;
3) data process: data process and include being carried out missing values, outlier, the record of exceptional value with related derivative variable
Generation;
Two) index system establishment:
With Electricity customers data as sample;Analyze each dimension data information having arrearage record client, extract many and arrearage wind
The variable may being correlated with in danger;Variable includes: produce penalty number of times, average returned money duration, the end of month pay the fees number of times, exceed the time limit and pay dues
Rate, whether exist exceed the time limit continuously, the duration that exceeds the time limit, business change number of times, the amount of money that should collect charges for electricity the most steady;
Three) correlation analysis:
To variable, including original variable and derivative variable, carry out correlation analysis, weigh the dependency between variable;Work as phase relation
When number is more than setting value, it is believed that two variable height correlations, then delete a wherein variable;
Single argument is analyzed, including the association analysis between explanatory variable and explained variable and chi-square analysis;
Four) model construction:
Produce penalty number of times according to variable, whether average returned money duration, the end of month pay the fees number of times, the rate of paying dues of exceeding the time limit, exist continuously
Exceed the time limit, the duration that exceeds the time limit, business change number of times, the amount of money that should collect charges for electricity whether steady, to high pressure client, low pressure is non-occupies client, resident
Client carries out the electricity charge Risk Modeling of correspondence, and calculates, according to model, the probability that electricity charge risk occurs;
Five) model output:
1) Electricity customers risk class classification:
The probability that electricity charge risk occurs is calculated, according to client's arrearage state, by Electricity customers risk class according to the model built
It is divided into potential risk and true risk two class:
A, potential risk: the electricity charge of being settled when model calculation, simultaneously according to the Electricity customers of electricity charge risk evaluation model output
Assessment result, is divided into potential excessive risk, potential risk and potential low-risk Three Estate;
B) true risk: not yet close or still pay for first record during model calculation, according to the wind of electricity charge risk evaluation model output
Danger grade, is divided into true excessive risk, true risk and true low-risk Three Estate;
2) Electricity customers risk trend is analyzed:
Record each Risk Variation direction investigating period, with the situation of paying dues of dynamic reflection client all sidedly:
Six) electricity charge risk difference processes:
Export result according to model, take tariff recovery strategy and the preventive measure of differentiation in advance, shorten the tariff recovery cycle,
Control business risk, for low-risk user, reduce prompting, notice link, reduce and urge expense frequency, preferentially promote e bill;
Excessive risk user is carried out collection work, promotes the frequency urging expense.
The electricity charge risk model construction method of a kind of big data the most according to claim 1, it is characterised in that: in step
Five) carry out modelling verification before model output, according to electricity charge risk evaluation model, Electricity customers is predicted assessment, and by result
Carry out contrast verification with actual arrearage result, including analyzing hit rate, coverage rate and the variation tendency of lifting degree, and model is done
Corresponding tuning processes;Wherein, hit rate: hit rate=predict correct sum/forecasting risk client's number, in descriptive model result
The ratio of correct result, this index is for weighing the accuracy of model;Coverage rate: the correct sum/actual arrearage of coverage rate=prediction
Sum, defaulting subscriber's quantity that descriptive model is excavated accounts for the ratio of true arrearage amount;Lifting degree: the hit of model prediction
The ratio of the hit rate of rate and random screening, is the reference standard weighing model validation.
The electricity charge risk model construction method of a kind of big data the most according to claim 1, it is characterised in that: outlier is
Numerical value beyond corresponding positive and negative 3 standard deviations of average of variable, exceptional value is the observation of bias data collection, outlier, exception
The processing method of value includes: outlier, exceptional value are adjusted to closest normal value;Directly reject outlier or exception
Value;Outlier or exceptional value is substituted by null value NULL;
When data are exceptional value, the reason that inspection exceptional value occurs, and be correspondingly processed;If outlier or exceptional value without
Business implication, the most directly rejects outlier or exceptional value or substitutes outlier or exceptional value by null value NULL.
The electricity charge risk model construction method of a kind of big data the most according to claim 1, it is characterised in that: missing values
Processing method includes: missing values is adjusted to fixed value;Missing values is adjusted to the random value of a Normal Distribution.
The electricity charge risk model construction method of a kind of big data the most according to claim 1, it is characterised in that: in step
Four) in model construction:
For high pressure customer modeling, the electricity charge Risk Calculation equation determined is:
Occupying customer modeling for low pressure is non-, the electricity charge Risk Calculation equation determined is:
Modeling for residential customers, the electricity charge Risk Calculation equation determined is:
If the probability that y occurs is p, then the probability that electricity charge risk occurs is:
Wherein:For long pointer during returned money,For produce penalty number of times index,For number of times index of paying dues the end of month,For exceeding
Phase pay dues rate index,For index of exceeding the time limit the most continuously,Long pointer during for exceeding the time limit,For business change number of times index,For
Should be collected charges for electricity the most steadily index;Wherein, when several variablees are forward to the combined influence power of arrears risk, due to magnitude
Difference, when it simultaneously enters model, the coefficient of Partial Variable becomes negative value;P is the general of Electricity customers generation arrears risk
Rate.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610587762.5A CN106251049A (en) | 2016-07-25 | 2016-07-25 | A kind of electricity charge risk model construction method of big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610587762.5A CN106251049A (en) | 2016-07-25 | 2016-07-25 | A kind of electricity charge risk model construction method of big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106251049A true CN106251049A (en) | 2016-12-21 |
Family
ID=57603387
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610587762.5A Pending CN106251049A (en) | 2016-07-25 | 2016-07-25 | A kind of electricity charge risk model construction method of big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106251049A (en) |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107657544A (en) * | 2017-09-14 | 2018-02-02 | 国网辽宁省电力有限公司 | A kind of automatic paying method of the improved electricity charge and system |
CN107895245A (en) * | 2017-12-26 | 2018-04-10 | 国网宁夏电力有限公司银川供电公司 | A kind of tariff recovery methods of risk assessment based on user's portrait |
CN108022179A (en) * | 2017-11-20 | 2018-05-11 | 国网福建省电力有限公司 | A kind of doubtful stealing Subject elements based on Chi-square Test determine method |
CN108961095A (en) * | 2018-06-13 | 2018-12-07 | 国网福建省电力有限公司 | A kind of intelligent collection electricity charge system based on AI |
CN108956885A (en) * | 2018-07-21 | 2018-12-07 | 翼捷安全设备(昆山)有限公司 | Gas detection intelligence Warning System based on sensor network |
CN109002549A (en) * | 2018-07-31 | 2018-12-14 | 国政通科技有限公司 | A kind of method and device for precisely hitting high-end tourism potential user |
CN109063984A (en) * | 2018-07-18 | 2018-12-21 | 平安科技(深圳)有限公司 | Risk passenger method, apparatus, computer equipment and storage medium |
CN109255555A (en) * | 2018-10-16 | 2019-01-22 | 中国电力科学研究院有限公司 | Electric power big data life period of equipment estimation method based on historical operational information |
CN109685526A (en) * | 2018-12-12 | 2019-04-26 | 税友软件集团股份有限公司 | A kind of method for evaluating credit rating of enterprise, device and relevant device |
CN109858749A (en) * | 2018-12-26 | 2019-06-07 | 广东电网有限责任公司 | It is a kind of that charging method and system are urged based on client's reference |
CN110210686A (en) * | 2019-06-13 | 2019-09-06 | 郑州轻工业学院 | A kind of electricity charge risk model construction method of electric power big data |
CN110782140A (en) * | 2019-10-11 | 2020-02-11 | 国网江苏省电力有限公司电力科学研究院 | Multi-dimensional element evaluation method for electric charge recovery risk screening |
CN111126776A (en) * | 2019-11-26 | 2020-05-08 | 国网浙江省电力有限公司 | Electricity charge risk prevention and control model construction method based on logistic regression algorithm |
CN111198907A (en) * | 2019-12-24 | 2020-05-26 | 深圳供电局有限公司 | Method and device for identifying potential defaulting user, computer equipment and storage medium |
CN111222239A (en) * | 2020-01-04 | 2020-06-02 | 华北理工大学 | Blast furnace ironmaking data standardization processing method and system |
CN111340375A (en) * | 2020-02-28 | 2020-06-26 | 创新奇智(上海)科技有限公司 | Electricity charge recycling risk prediction method and device, electronic equipment and storage medium |
CN111461574A (en) * | 2020-04-24 | 2020-07-28 | 国网吉林省电力有限公司 | User electricity charge clearing risk discovery method based on regional geographical position information |
CN111639882A (en) * | 2020-06-15 | 2020-09-08 | 江苏电力信息技术有限公司 | Deep learning-based power utilization risk judgment method |
CN111861703A (en) * | 2020-07-10 | 2020-10-30 | 深圳无域科技技术有限公司 | Data-driven wind control strategy rule generation method and system and risk control method and system |
CN111968268A (en) * | 2020-06-29 | 2020-11-20 | 南斗六星系统集成有限公司 | New energy vehicle health condition remote evaluation method and system |
CN112633663A (en) * | 2020-12-17 | 2021-04-09 | 南方电网海南数字电网研究院有限公司 | Electricity charge meter reading accounting analysis system based on big data platform |
CN112749922A (en) * | 2021-02-01 | 2021-05-04 | 深圳无域科技技术有限公司 | Wind control model training method, system, equipment and computer readable medium |
CN113255137A (en) * | 2021-05-31 | 2021-08-13 | 中铁第一勘察设计院集团有限公司 | Target object strain data processing method and device and storage medium |
CN113256008A (en) * | 2021-05-31 | 2021-08-13 | 国家电网有限公司大数据中心 | Arrearage risk level determination method, device, equipment and storage medium |
CN113642825A (en) * | 2021-05-28 | 2021-11-12 | 浙江惠瀜网络科技有限公司 | Supervision method suitable for vehicle loan cooperation mechanism |
CN114165777A (en) * | 2020-09-10 | 2022-03-11 | 河北云酷科技有限公司 | Intelligent identification model for four-pipe leakage of power plant boiler |
CN115662464A (en) * | 2022-12-29 | 2023-01-31 | 广州市云景信息科技有限公司 | Method and system for intelligently identifying environmental noise |
CN115730748A (en) * | 2022-12-30 | 2023-03-03 | 广西电网有限责任公司 | KNN algorithm-based power customer behavior prediction method and system |
CN117725540A (en) * | 2024-02-07 | 2024-03-19 | 宇恒数智(北京)科技有限公司 | Method, system, equipment and medium for calculating dynamic baseband |
-
2016
- 2016-07-25 CN CN201610587762.5A patent/CN106251049A/en active Pending
Cited By (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107657544A (en) * | 2017-09-14 | 2018-02-02 | 国网辽宁省电力有限公司 | A kind of automatic paying method of the improved electricity charge and system |
CN108022179A (en) * | 2017-11-20 | 2018-05-11 | 国网福建省电力有限公司 | A kind of doubtful stealing Subject elements based on Chi-square Test determine method |
CN108022179B (en) * | 2017-11-20 | 2024-03-26 | 国网福建省电力有限公司 | Suspected electricity larceny subject factor determination method based on chi-square test |
CN107895245A (en) * | 2017-12-26 | 2018-04-10 | 国网宁夏电力有限公司银川供电公司 | A kind of tariff recovery methods of risk assessment based on user's portrait |
CN108961095A (en) * | 2018-06-13 | 2018-12-07 | 国网福建省电力有限公司 | A kind of intelligent collection electricity charge system based on AI |
CN109063984B (en) * | 2018-07-18 | 2023-09-05 | 平安科技(深圳)有限公司 | Method, apparatus, computer device and storage medium for risky travelers |
CN109063984A (en) * | 2018-07-18 | 2018-12-21 | 平安科技(深圳)有限公司 | Risk passenger method, apparatus, computer equipment and storage medium |
CN108956885A (en) * | 2018-07-21 | 2018-12-07 | 翼捷安全设备(昆山)有限公司 | Gas detection intelligence Warning System based on sensor network |
CN108956885B (en) * | 2018-07-21 | 2020-06-16 | 翼捷安全设备(昆山)有限公司 | Gas detection intelligent risk early warning system based on sensor network |
CN109002549A (en) * | 2018-07-31 | 2018-12-14 | 国政通科技有限公司 | A kind of method and device for precisely hitting high-end tourism potential user |
CN109255555A (en) * | 2018-10-16 | 2019-01-22 | 中国电力科学研究院有限公司 | Electric power big data life period of equipment estimation method based on historical operational information |
CN109255555B (en) * | 2018-10-16 | 2023-10-27 | 中国电力科学研究院有限公司 | Electric power big data equipment life cycle estimation method based on historical operation information |
CN109685526A (en) * | 2018-12-12 | 2019-04-26 | 税友软件集团股份有限公司 | A kind of method for evaluating credit rating of enterprise, device and relevant device |
CN109858749A (en) * | 2018-12-26 | 2019-06-07 | 广东电网有限责任公司 | It is a kind of that charging method and system are urged based on client's reference |
CN110210686A (en) * | 2019-06-13 | 2019-09-06 | 郑州轻工业学院 | A kind of electricity charge risk model construction method of electric power big data |
CN110782140B (en) * | 2019-10-11 | 2022-08-12 | 国网江苏省电力有限公司电力科学研究院 | Multi-dimensional element evaluation method for electric charge recovery risk screening |
CN110782140A (en) * | 2019-10-11 | 2020-02-11 | 国网江苏省电力有限公司电力科学研究院 | Multi-dimensional element evaluation method for electric charge recovery risk screening |
CN111126776A (en) * | 2019-11-26 | 2020-05-08 | 国网浙江省电力有限公司 | Electricity charge risk prevention and control model construction method based on logistic regression algorithm |
CN111198907A (en) * | 2019-12-24 | 2020-05-26 | 深圳供电局有限公司 | Method and device for identifying potential defaulting user, computer equipment and storage medium |
CN111222239A (en) * | 2020-01-04 | 2020-06-02 | 华北理工大学 | Blast furnace ironmaking data standardization processing method and system |
CN111340375A (en) * | 2020-02-28 | 2020-06-26 | 创新奇智(上海)科技有限公司 | Electricity charge recycling risk prediction method and device, electronic equipment and storage medium |
CN111461574A (en) * | 2020-04-24 | 2020-07-28 | 国网吉林省电力有限公司 | User electricity charge clearing risk discovery method based on regional geographical position information |
CN111461574B (en) * | 2020-04-24 | 2022-03-29 | 国网吉林省电力有限公司 | User electricity charge clearing risk discovery method based on regional geographical position information |
CN111639882A (en) * | 2020-06-15 | 2020-09-08 | 江苏电力信息技术有限公司 | Deep learning-based power utilization risk judgment method |
CN111639882B (en) * | 2020-06-15 | 2023-05-19 | 江苏电力信息技术有限公司 | Deep learning-based electricity risk judging method |
CN111968268A (en) * | 2020-06-29 | 2020-11-20 | 南斗六星系统集成有限公司 | New energy vehicle health condition remote evaluation method and system |
CN111861703B (en) * | 2020-07-10 | 2024-05-10 | 深圳无域科技技术有限公司 | Data-driven wind control strategy rule generation method and system and risk control method and system |
CN111861703A (en) * | 2020-07-10 | 2020-10-30 | 深圳无域科技技术有限公司 | Data-driven wind control strategy rule generation method and system and risk control method and system |
CN114165777A (en) * | 2020-09-10 | 2022-03-11 | 河北云酷科技有限公司 | Intelligent identification model for four-pipe leakage of power plant boiler |
CN114165777B (en) * | 2020-09-10 | 2023-10-24 | 河北云酷科技有限公司 | Intelligent recognition model for four-pipe leakage of power plant boiler |
CN112633663A (en) * | 2020-12-17 | 2021-04-09 | 南方电网海南数字电网研究院有限公司 | Electricity charge meter reading accounting analysis system based on big data platform |
CN112749922A (en) * | 2021-02-01 | 2021-05-04 | 深圳无域科技技术有限公司 | Wind control model training method, system, equipment and computer readable medium |
CN113642825A (en) * | 2021-05-28 | 2021-11-12 | 浙江惠瀜网络科技有限公司 | Supervision method suitable for vehicle loan cooperation mechanism |
CN113256008A (en) * | 2021-05-31 | 2021-08-13 | 国家电网有限公司大数据中心 | Arrearage risk level determination method, device, equipment and storage medium |
CN113255137A (en) * | 2021-05-31 | 2021-08-13 | 中铁第一勘察设计院集团有限公司 | Target object strain data processing method and device and storage medium |
CN115662464A (en) * | 2022-12-29 | 2023-01-31 | 广州市云景信息科技有限公司 | Method and system for intelligently identifying environmental noise |
CN115730748B (en) * | 2022-12-30 | 2023-06-23 | 广西电网有限责任公司 | KNN algorithm-based power customer behavior prediction method and system |
CN115730748A (en) * | 2022-12-30 | 2023-03-03 | 广西电网有限责任公司 | KNN algorithm-based power customer behavior prediction method and system |
CN117725540A (en) * | 2024-02-07 | 2024-03-19 | 宇恒数智(北京)科技有限公司 | Method, system, equipment and medium for calculating dynamic baseband |
CN117725540B (en) * | 2024-02-07 | 2024-05-07 | 宇恒数智(北京)科技有限公司 | Method, system, equipment and medium for calculating dynamic baseband |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106251049A (en) | A kind of electricity charge risk model construction method of big data | |
CN110097297B (en) | Multi-dimensional electricity stealing situation intelligent sensing method, system, equipment and medium | |
León et al. | Variability and trend-based generalized rule induction model to NTL detection in power companies | |
CN103258069B (en) | A kind of Forecasting Methodology of steel industry electricity needs | |
CN106780140A (en) | Electric power credit assessment method based on big data | |
CN106339942A (en) | Financial information processing method and system | |
CN107145966A (en) | Logic-based returns the analysis and early warning method of opposing electricity-stealing of probability analysis Optimized model | |
CN110458230A (en) | A kind of distribution transforming based on the fusion of more criterions is with adopting data exception discriminating method | |
CN102081781A (en) | Finance modeling optimization method based on information self-circulation | |
CN107292744A (en) | Investment Trend analysis method and its system based on machine learning | |
CN105867341A (en) | Online equipment health state self-detection method and system for tobacco processing equipment | |
CN109102396A (en) | A kind of user credit ranking method, computer equipment and readable medium | |
CN111178672B (en) | Intelligent inspection method based on balance | |
Liu et al. | Application of hierarchical clustering in tax inspection case-selecting | |
Zheng et al. | [Retracted] Using an Optimized Learning Vector Quantization‐(LVQ‐) Based Neural Network in Accounting Fraud Recognition | |
CN113450004A (en) | Power credit report generation method and device, electronic equipment and readable storage medium | |
CN115905319B (en) | Automatic identification method and system for abnormal electricity fees of massive users | |
CN107194529B (en) | Power distribution network reliability economic benefit analysis method and device based on mining technology | |
CN108493933A (en) | A kind of Characteristics of Electric Load method for digging based on depth decision Tree algorithms | |
CN104268804A (en) | High-quality electric power customer data mining method based on hierarchical data envelopment analysis | |
KR102336462B1 (en) | Apparatus and method of credit rating | |
Pagone et al. | Carbon footprint comparison of bitcoin and conventional currencies in a life cycle analysis perspective | |
CN114066219A (en) | Electricity stealing analysis method for intelligently identifying electricity utilization abnormal points under incidence matrix | |
Xie et al. | The engineering of China commercial bank operational risk measurement | |
CN112418600A (en) | Enterprise policy scoring method and system based on index set |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161221 |
|
RJ01 | Rejection of invention patent application after publication |