CN115204934A - National grid electricity consumption customer fraud risk prediction method based on scoring card - Google Patents
National grid electricity consumption customer fraud risk prediction method based on scoring card Download PDFInfo
- Publication number
- CN115204934A CN115204934A CN202210524254.8A CN202210524254A CN115204934A CN 115204934 A CN115204934 A CN 115204934A CN 202210524254 A CN202210524254 A CN 202210524254A CN 115204934 A CN115204934 A CN 115204934A
- Authority
- CN
- China
- Prior art keywords
- data
- fraud risk
- national grid
- prediction method
- scoring card
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0207—Discounts or incentives, e.g. coupons or rebates
- G06Q30/0225—Avoiding frauds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/06—Electricity, gas or water supply
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Theoretical Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- General Physics & Mathematics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Game Theory and Decision Science (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Educational Administration (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a country network electricity consumption customer fraud risk prediction method based on a scorecard, and the technical scheme of the invention is characterized in that S1, data acquisition, data combing, such as user basic account information, user behavior information, activity participation information and the like, data integrity and accuracy verification, and electricity consumption customer feature library establishment; s2, data cleaning; s3, characteristic engineering; s4, establishing a model by adopting a scoring card algorithm, and training and verifying; and S5, dividing the risk level of the user according to the output result of the scoring card, and configuring different disposal strategies.
Description
Technical Field
The invention belongs to a method for effectively intercepting risk customers and ensuring effective issuing of activity rewards, and relates to a national network electricity customer fraud risk prediction method based on a score card.
Background
With the gradual popularization of the internet APP, the number of online electricity customers is steadily increased, and the online handling capacity of the electric power service is synchronously improved. The importance of online customers' operations, such as activity operations, rights and interests issuance, is becoming increasingly important. At present, a wind control means for cheating groups is lacked, a traditional wind control rule engine cannot effectively identify and intercept cheating components, only can adopt the modes of activity off-shelf treatment and the like, and the experience of operation activity clients is poor.
Disclosure of Invention
The method solves the problems that in the prior art, a wind control means for cheating groups is lacked, a traditional wind control rule engine cannot effectively identify and intercept cheating partners, only can be used for processing in ways of moving off shelves and the like, and the experience of an operation activity client is poor, and provides a method for predicting the cheating risk of a national grid electricity client based on a score card.
The technical scheme adopted by the invention for solving the technical problems is as follows: a credit card-based national network electricity customer fraud risk prediction method comprises the following steps,
s1, acquiring data, combing basic account information of a user, user behavior information and activity participation information, checking the integrity and accuracy of the data, and establishing a power utilization customer feature library;
s2, cleaning data;
s3, characteristic engineering;
s4, establishing a model by adopting a scoring card algorithm, and training and verifying;
s5, dividing the risk level of the user according to the output result of the scoring card, and configuring different disposal strategies;
in the step S3, the characteristic engineering comprises the following sub-steps,
a1, aiming at a missing value and an abnormal value, matching a corresponding processing strategy according to a missing proportion;
a2, adopting an optimized box separation strategy to reduce the risk of model overfitting;
and A3, respectively calculating WOE (word of error) and IV (mean of error) values of different boxes in each variable according to the box separation result so as to be used for variable screening and model training.
According to the method, the APP account information, the activity data, the behavior data and the like of the 'online national network' of the company Zhejiang, china network are used as basic data, the electricity consumption customer feature library is established, the fraud risk of the electricity consumption customers is predicted based on the scoring card algorithm, the risk customers are effectively intercepted, and the activity rewards are effectively issued.
Preferably, in A1, the numerical features are processed by a median, mean, and linear filling method, the class features are processed by a mode filling method, and the abnormal values are detected and processed by an extreme value method or a quartile range method.
Preferably, in A2, the sample ratio and the positive-negative label ratio of the feature division regions are used to merge adjacent regions that satisfy the merging condition until a certain stopping criterion is satisfied.
Preferably, in said A2, the following sub-steps are performed,
a21, sorting, initializing and binning, sorting the numerical type features, wherein the number of initialized bins is min (100, n × 10%), wherein n is the sample size,
a22, calculating W i ,
y i : number of negative samples in interval i
y T : total number of negative samples
n i : number of front samples in interval i
n T : total number of front samples
A23, combining the intervals, calculating the combination gain of the adjacent intervals,
E i,i+1 =W i,i+1 -W i -W i+1
respectively calculate E i,i-1 ,E i,i+1 Selecting the first n combinations with the maximum profit for combination;
and A24, the box separation is ended, when the number of the intervals meets the expected setting, the box separation operation is ended, and the stopping condition is as follows: meet the desired number of bins or complete iterations.
Preferably, WOE (evidence weight) and IV (information value) values of different boxes in each variable are respectively calculated according to the box dividing results for variable screening and model training, wherein the WOE (evidence weight) and the IV (information value) values are used for variable screening and model training
and (5) according to the calculation result, removing the characteristics of IV less than 0.1, and finally entering model training.
Preferably, in step S4, a scoring card algorithm is used for model training and verification, and sample data includes 70% as a training set and 30% as a verification set.
The substantial effects of the invention are as follows: according to the method, the APP account information, the activity data, the behavior data and the like of the 'online national network' of the company Zhejiang, china network are used as basic data, the electricity consumption customer feature library is established, the fraud risk of the electricity consumption customers is predicted based on the scoring card algorithm, the risk customers are effectively intercepted, and the activity rewards are effectively issued.
Drawings
FIG. 1 is a schematic overall flow chart of the present invention;
FIG. 2 is a comparison graph of the model effect of the training set and the test set in the present invention;
FIG. 3 is a schematic flow diagram of a characteristic binning method of the present invention;
fig. 4 is a flow chart illustrating the improved optimal binning strategy of the present invention.
Detailed Description
The technical solution of the present invention will be further specifically described below by way of specific examples.
Example 1:
a fraud risk prediction method for national grid power utilization customers based on a score card (see figure 1) comprises the following steps,
s1, data acquisition
And combing data such as user basic account information, user behavior information, activity participation information and the like, checking the integrity and accuracy of the data, and establishing a power utilization customer feature library. The data time range of the modeling shows that sample data is extracted by taking 5 months from 6 months to 2021 year 2020 as an observation period, and sample labels are extracted by taking 6 months to 2021 year 9 months as an observation period.
And S2, data cleaning, which is the prior art, is performed, and data selection is performed according to a preset format, which is not described in detail in this embodiment.
S3, characteristic engineering
A1, processing missing value and abnormal value
According to the method, different processing strategies are matched according to different deletion ratios, the characteristics with high deletion ratios are removed, numerical type characteristics are processed by adopting methods such as median, mean, linear filling and the like, and category characteristics are filled by adopting mode. The detection and processing of abnormal values are carried out by an extreme value method (for example, a 1% or 99% quantile capping method), a quartile range method, or the like.
A2, characteristic box separation
In the embodiment, a user-defined optimal binning strategy is adopted to perform optimal binning on the category characteristics and the numerical characteristics respectively, so that the characteristics have strong robustness on abnormal data, and the risk of model overfitting is reduced.
A3, WOE and IV value calculation
And respectively calculating WOE (word of error) and IV (input/output) values of different boxes in each variable according to the box separation result so as to be used for variable screening and model training.
according to the calculation result, the characteristics that IV is less than 0.1 are removed, and the indexes of model training are shown in the table 1:
feature(s) | IV value |
Number of device login accounts | 0.31 |
Ip associated account number of 7 days | 0.26 |
Number of activity participation in approximately 7 days | 0.16 |
Usage amount of red envelope in nearly 7 days | 0.13 |
Number of account bound for nearly 30 days | 0.19 |
Total point of nearly 30 days | 0.22 |
Total recommended population | 0.11 |
Account age | 0.18 |
Authentication of real name | 0.19 |
Days of Login in approximately 180 days | 0.12 |
Number of account unbinding in about 30 days | 0.16 |
Red packet deduction ratio used in nearly 7 days | 0.25 |
Using integral deduction ratio in near 7 days | 0.13 |
Accumulating the number of devices logged in | 0.14 |
Number of charges made in about 30 days | 0.11 |
The sum of the fees paid by the households in nearly 30 days | 0.13 |
Cumulative number of paid households | 0.12 |
Number of bound households per day | 0.23 |
TABLE 1 index of final model training
S4, model training and verification (see the attached figure 2)
The modeling adopts a scoring card algorithm, and 1658 ten thousand samples are collected, wherein 70% of samples are used as a training set, and 30% of samples are used as a verification set.
S5, model result application
According to the output result of the scoring card, the risk grade of the user is divided, and different disposal strategies are configured, wherein the grade division and the disposal strategy details are shown in a table 2:
between scoring areas | Rating label | Handling policy |
(80,100] | Low risk | Do not process |
(60,80] | Middle risk | Monitoring |
(0,60] | High risk | Interception black coating |
TABLE 2 ranking and handling policy details
The box separation method for the user-defined numerical characteristic of the embodiment specifically comprises the following processes (see the attached drawing 3):
in said A2, the following sub-steps are performed,
a21, sequencing, initializing and binning, sequencing the numerical type features, wherein the number of initialized and binned boxes is min (100, n is 10%), wherein n is the sample size,
a22, calculating W i ,
y i : number of negative samples in interval i
y T : total number of negative samples
n i : number of front samples in interval i
n T : total number of front samples
A23, combining the intervals, calculating the combination gain of the adjacent intervals,
E i,i+1 =W i,i+1 -W i -W i+1
respectively calculate E i,i-1 ,E i,i+1 Selecting the first n combinations with the maximum profit for combination;
and A24, the box separation is ended, when the number of the intervals meets the expected setting, the box separation operation is ended, and the stopping condition is as follows: meet the desired number of bins or complete iterations.
In this embodiment, when the improved optimal binning strategy is adopted (see fig. 4), the following steps are performed:
firstly, data acquisition and data cleaning are carried out, and data such as account number basic information, user behavior information and user activity information are collected from a marketing system, an online and national network APP application platform and the like.
And then performing feature engineering, processing the missing values by methods such as median, mode, mean, linear filling and the like, performing optimal binning on the features after the missing value processing is completed, respectively calculating WOE (weighted average) and IV (weighted average) values after binning, and preliminarily screening indexes according to the IV values and correlation coefficients.
And performing model training and model verification again, dividing the sample set into a training set and a test set, finishing the model training of the scoring card by using the data of the training set, and outputting the weight of each index. And performing simulation verification on the trained model on a verification set, and evaluating the actual effect of the model. And calculating the characteristic score, and calculating the score of each interval according to the WOE and the weight of the characteristic to form a final score card. And the grading card is deployed and applied, the grading card model is deployed in a system server, and differentiation strategies are adopted for users with different grading results, so that the wind control quality of the online national network is improved.
In the embodiment, the APP account information, the activity data, the behavior data and the like of the 'Internet and national network' of the company Zhejiang, china and the like are used as basic data, the power consumption customer feature library is established, the fraud risk of the power consumption customers is predicted based on the scoring card algorithm, the risk customers are effectively intercepted, and the activity rewards are effectively issued.
The above-described embodiments are only preferred embodiments of the present invention, and are not intended to limit the present invention in any way, and other variations and modifications may be made without departing from the spirit of the invention as set forth in the claims.
Claims (6)
1. A national grid power utilization customer fraud risk prediction method based on a score card is characterized by comprising the following steps,
s1, acquiring data, combing basic account information of a user, user behavior information and activity participation information, checking the integrity and accuracy of the data, and establishing a power utilization customer feature library;
s2, data cleaning;
s3, characteristic engineering;
s4, establishing a model by adopting a scoring card algorithm, and training and verifying;
s5, dividing the risk level of the user according to the output result of the scoring card, and configuring different disposal strategies;
in the step S3, the characteristic engineering comprises the following sub-steps,
a1, aiming at a missing value and an abnormal value, matching a corresponding processing strategy according to a missing proportion;
a2, adopting an optimized box separation strategy to reduce the risk of model overfitting;
and A3, respectively calculating WOE and IV values of different bins in each variable according to the bin dividing result so as to be used for variable screening and model training.
2. The method for predicting the fraud risk of the national grid electricity consumers based on the score card as claimed in claim 1, wherein in the A1, the numerical features are processed by a median, mean and linear filling method, the category features are filled by a mode, and the abnormal values are detected and processed by an extreme value method or a quartering distance method.
3. The national grid power consumption customer fraud risk prediction method based on the score card as claimed in claim 1, wherein in A2, adjacent sections satisfying the merging condition are merged by using a sample proportion and a positive-negative label proportion between feature division sections until a certain stopping criterion is satisfied.
4. The national grid electricity utilization customer fraud risk prediction method based on the score card according to claim 1 or 3, characterized in that in A2, the following sub-steps are performed,
a21, sorting, initializing and binning, and sorting the numerical type features, wherein the number of initialized bins is min (100, n x 10%), and n is the sample size;
a22, calculating W i ,
y i : number of negative samples in interval i
y T : total number of negative samples
n i : number of front samples in interval i
n T : total number of front samples
A23, combining the intervals, calculating the combination gain of the adjacent intervals,
E i,i+1 =W i,i+1 -W i -W i+1
respectively calculate E i,i-1 ,E i,i+1 Selecting the first n combinations with the maximum profit for combination;
and A24, the box separation is ended, when the number of the intervals meets the expected setting, the box separation operation is ended, and the stopping condition is as follows: meet the desired number of bins or complete iterations.
5. The method as claimed in claim 1, wherein the WOE and IV values of different bins of each variable are calculated according to the result of the bin division for variable screening and model training, wherein the WOE and IV values are used for variable screening and model training
and according to the calculation result, removing the characteristics with IV less than 0.1, and finally entering model training.
6. The method for predicting fraud risk of national grid electricity consumers according to claim 1, wherein in step S4, a scoring card algorithm is adopted during model training and verification, and 70% of sample data is used as a training set and 30% is used as a verification set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210524254.8A CN115204934A (en) | 2022-05-13 | 2022-05-13 | National grid electricity consumption customer fraud risk prediction method based on scoring card |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210524254.8A CN115204934A (en) | 2022-05-13 | 2022-05-13 | National grid electricity consumption customer fraud risk prediction method based on scoring card |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115204934A true CN115204934A (en) | 2022-10-18 |
Family
ID=83575283
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210524254.8A Pending CN115204934A (en) | 2022-05-13 | 2022-05-13 | National grid electricity consumption customer fraud risk prediction method based on scoring card |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115204934A (en) |
-
2022
- 2022-05-13 CN CN202210524254.8A patent/CN115204934A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107103548A (en) | The monitoring method and system and risk monitoring and control method and system of network behavior data | |
CN101504745A (en) | Risk regulation/model establishing and optimizing system and method based on network on-line service | |
CN103605714B (en) | The recognition methods of website abnormal data and device | |
KR20120040589A (en) | Optimum tender price prediction method and system | |
CN101236638A (en) | Web based bank card risk monitoring method and system | |
CN112102073A (en) | Credit risk control method and system, electronic device and readable storage medium | |
CN108428055B (en) | Load clustering method considering load longitudinal characteristics | |
CN109345313A (en) | Customer satisfaction survey system and method based on big data | |
CN106408325A (en) | User consumption behavior prediction analysis method based on user payment information and system | |
CN114943565A (en) | Electric power spot price prediction method and device based on intelligent algorithm | |
CN115577152A (en) | Online book borrowing management system based on data analysis | |
CN114154672A (en) | Data mining method for customer churn prediction | |
CN110610415B (en) | Method and device for updating model | |
CN115204934A (en) | National grid electricity consumption customer fraud risk prediction method based on scoring card | |
CN115423600B (en) | Data screening method, device, medium and electronic equipment | |
CN113052422A (en) | Wind control model training method and user credit evaluation method | |
CN114418018A (en) | Model performance evaluation method, device, equipment and storage medium | |
CN109785126A (en) | Business loan dynamic air control cost accounting and floating pricing method and system | |
CN114358519A (en) | Intelligent credit limit interest rate adjusting method and device | |
CN114626940A (en) | Data analysis method and device and electronic equipment | |
CN113269412A (en) | Risk assessment method and related device | |
CN114049190A (en) | Financial fraud risk assessment and solution method based on transaction behavior feature extraction | |
CN113592140A (en) | Electric charge payment prediction model training system and electric charge payment prediction model | |
CN112598225A (en) | Evaluation index determination method and apparatus, storage medium, and electronic apparatus | |
CN116151670B (en) | Intelligent evaluation method, system and medium for marketing project quality of marketing business |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |