CN109087196A - Credit-graded approach, system, computer equipment and readable medium - Google Patents
Credit-graded approach, system, computer equipment and readable medium Download PDFInfo
- Publication number
- CN109087196A CN109087196A CN201810947751.2A CN201810947751A CN109087196A CN 109087196 A CN109087196 A CN 109087196A CN 201810947751 A CN201810947751 A CN 201810947751A CN 109087196 A CN109087196 A CN 109087196A
- Authority
- CN
- China
- Prior art keywords
- variable
- model
- data
- module
- variables
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Engineering & Computer Science (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of credit-graded approach, system, computer equipment and readable medium, in data acquisition phase, select multichannel information, such as operator's informaiton, credit card information and debit card information etc., the prediction effect and stability of model can be enhanced in the information for comprehensively considering user's various aspects;Derivative field based on isochronous surface can extract the information for really having predictive value, can also increase the prediction effect of model;In data preparation stage, the higher variable of miss rate and the variable comprising being difficult to interpretation level are deleted, the stability of entire model is enhanced.In the model development stage, the variable with multicollinearity is deleted, enhances the stability of model.The variable for really having predictive ability is selected using machine learning methods such as LASSO, improves the predictive ability of final mask.
Description
Technical field
The present invention relates to credit scorings.More particularly, to credit-graded approach, system, computer equipment and readable Jie
Matter.
Background technique
Credit rating is also known as " credit rating " or " prestige grading ", is the important content and base for establishing social credit system
Plinth.Traditional credit rating method is mostly based on expert method or scorecard model, i.e., specifies a set of scoring previously according to banker experience
Rule applies this rule and scores further according to the real data of user.Chinese invention patent, application number
201710197889.0, disclose a kind of the special of entitled loan user credit ranking method and system based on machine learning
Benefit.The patent discloses that a kind of credit rating method, specifically includes that and be acquired to modeling sample data, the sign of trade company is obtained
Letter report and whether overdue data;Reference data reporting is pre-processed, including data are extracted and index subdivision, are predicted
Variable and its weight;Sample data is modeled using a kind of machine learning method, obtains prediction model;Use prediction model
New loan user is predicted, the Default Probability of new user is obtained;It is scored using new user's Default Probability it, is obtained
The credit scoring of new user.The present invention establishes model using a kind of machine learning algorithm, allows model according to completely new user
Data carry out iteratively faster, can be widely applied to computer field.But the patent still has several drawbacks, mainly finally
The selection of predictive variable.The patent analyzes credit line, recent behavior, the credit duration, account extracted from reference report
143 predictive variables of five dimensions such as quantity and refund history.In order to reduce operand and promote predetermined speed, from 143
7 final mask predictive variables are filtered out in predictive variable, comprising: credit card is used and is averaged the accrediting amount, nearest one
The credit card of secondary refund is borrowed away from modern time, nearest 24 months inquiry times, the last credit card away from the now time, earliest
Note card is away from the now time, nearest 3 months inquiry times, nearest six months inquiry times.This article does not explicitly point out its use
Variable Selection, the variable selected without well cover five dimensions.It can be seen that this 7 predictions from variable meaning
There are certain multicollinearity between variable, there may be unstability for the model established accordingly.Therefore, at present for tradition
There is still a need for improve the method for credit rating.
Summary of the invention
In view of this, in order to solve to need to carry out for there are still some defects in traditional credit rating method at present
The problem of improvement, the present invention adopt the following technical solutions:
First aspect present invention provides a kind of credit-graded approach, which is characterized in that the described method includes:
The initial data of the credit card application client of collection is cleaned, the variable for not meeting preset condition is left out;
Data branch mailbox is carried out to the character type variable of reservation;
Rating Model is constructed, is scored new application user, is determined whether to ratify user's application according to appraisal result.
Preferably, the building Rating Model scores to new application user, decides whether to ratify according to appraisal result
User applies
Data type needed for determining and extracting Rating Model;
Extracted data are cleaned, the neat model candidate variables of format are obtained;
Cleaned candidate variables are selected, the variable that interpretability is higher than the first default value is retained;
The evaluation criteria of computation model assesses model entirety predictive ability;
Using the language of suitable production environment, scorecard model is deployed to production environment.
Preferably, described to clean to extracted data, obtaining the neat model candidate variables of format includes:
The meaning of situational variables deletes variable and meaningless variable after borrowing;
Situational variables null value rate deletes the variable that miss rate is higher than the second default value;
Situational variables distribution situation deletes single level variable.
Preferably, it is described to cleaned candidate variables carry out selection include:
Evaluation type single argument KS retains the higher variable of KS;
Single argument IV is calculated, the higher variable of IV is retained;
Related coefficient between calculating variable, removes multicollinearity;
Variable is selected using LASSO machine learning method, retains the variable that interpretability is higher than default value.
Preferably, the evaluation criteria of the computation model, carrying out assessment to model entirety predictive ability includes:
It is scored using model test sample, the scoring distribution of fine or not client is obtained, according to the registration of two distributions
Judgment models predictive ability;
Test sample is predicted using model, obtains KS value or AUC value, according to the value judgment models predictive ability.
Second aspect of the present invention provides a kind of credit scoring system, comprising:
Data cleansing module, the initial data for the credit card application client to collection clean, leave out and do not meet
The variable of preset condition;
Data categorization module, for carrying out data branch mailbox to the character type variable of reservation;
Model construction module scores to new application user, is according to appraisal result determination for constructing Rating Model
No approval user application.
Preferably, the model construction module includes:
Extraction module, data type needed for determining and extracting Rating Model;
Data cleansing module is extracted, extracted data are cleaned, the neat model candidate variables of format are obtained;
Data selecting module selects cleaned candidate variables, retains interpretability and is higher than the first default value
Variable;
Evaluation module, the evaluation criteria of computation model assess model entirety predictive ability;
Scorecard model is deployed to production environment using the language of suitable production environment by deployment module.
Preferably, the extraction data cleansing module includes:
Variable meaning analysis module, the meaning of situational variables delete variable and meaningless variable after borrowing;
Variable null value rate analysis module, situational variables null value rate delete the variable that miss rate is higher than the second default value;
Variable distribution situation analysis module, situational variables distribution situation delete single level variable.
Preferably, the data selecting module includes:
KS computing module, evaluation type single argument KS retain the higher variable of KS;
IV computing module calculates single argument IV, retains the higher variable of IV;
Related coefficient computing module calculates related coefficient between variable, removes multicollinearity;
Variables choice module selects variable using LASSO machine learning method, retains interpretability and is higher than default value
Variable.
Preferably, the deployment module is configured as
It is scored using model test sample, the scoring distribution of fine or not client is obtained, according to the registration of two distributions
Judgment models predictive ability;
Test sample is predicted using model, obtains KS value or AUC value, according to the value judgment models predictive ability.
Third aspect present invention provides a kind of computer equipment, including memory, processor and storage are on a memory
And the computer program that can be run on a processor,
The processor realizes method as described above when executing described program.
Fourth aspect present invention provides a kind of computer-readable medium, is stored thereon with computer program, which is located
Reason device realizes method as described above when executing.
Beneficial effects of the present invention are as follows:
The present invention provides a kind of credit-graded approach, system, computer equipment and readable medium, in data acquisition phase,
Multichannel information, such as operator's informaiton, credit card information and debit card information etc. is selected to comprehensively consider the letter of user's various aspects
Breath, can be enhanced the prediction effect and stability of model;Derivative field based on isochronous surface, can extract really has prediction
The information of value can also increase the prediction effect of model;In data preparation stage, the higher variable of miss rate and packet are deleted
Containing the variable for being difficult to interpretation level, the stability of entire model is enhanced.In the model development stage, delete with multiple conllinear
The variable of property, enhances the stability of model.The variable for really having predictive ability is selected using machine learning methods such as LASSO,
Improve the predictive ability of final mask.
Detailed description of the invention
Specific embodiments of the present invention will be described in further detail with reference to the accompanying drawing.
Fig. 1 shows credit scoring system structural schematic diagram in the embodiment of the present invention.
Fig. 2 shows the structural schematic diagrams of model construction module in Fig. 1.
Fig. 3 shows the structural schematic diagram that data cleansing module is extracted in Fig. 2.
Fig. 4 shows the structural schematic diagram of data selecting module in Fig. 2.
Fig. 5 shows the structural schematic diagram for being suitable for the computer equipment for the terminal device for being used to realize the embodiment of the present application
Specific embodiment
In order to illustrate more clearly of the present invention, the present invention is done further below with reference to preferred embodiments and drawings
It is bright.Similar component is indicated in attached drawing with identical appended drawing reference.It will be appreciated by those skilled in the art that institute is specific below
The content of description is illustrative and be not restrictive, and should not be limited the scope of the invention with this.
The present invention provides a kind of credit-graded approach, specifically includes:
S1: the initial data of the credit card application client of collection is cleaned, the variable for not meeting preset condition is left out.
S2: data branch mailbox is carried out to the character type variable of reservation.
Specifically, the horizontal quantity of all character type variables is calculated for character type variable, by the water of horizontal negligible amounts
The gentle horizontal immediate level of meaning merges;The horizontal quantity of all character type variables is calculated, if some variable
Horizontal quantity it is more, then multi-state variable is merged by few state variable using kmeans clustering algorithm, wherein each level
The position level quantity and bad client's accounting indicate;For numeric type variable, also referred to as supervised using optimal segmentation from
Dispersion is divided using recurrence continuous variable being divided into segmentation, be a kind of calculation searched based on inferred conditions and be preferably grouped behind
Method.
According to the branch mailbox of variable as a result, original variable is converted into WOE.WOE conversion can be by logistic regression model
It is transformed into scale card format.
S3: building Rating Model scores to new application user, is determined whether to ratify user Shen according to appraisal result
Please.
Specifically, the building Rating Model, scores to new application user, decides whether to ratify according to appraisal result
User applies
S31: data type needed for determining and extracting Rating Model;
Specifically, the table structure and variable meaning of detailed analytical information data, potential significant according to initial data derivative
Variable, such as the derivative based on isochronous surface.Extract data from selected data classification, according to unique key by all data into
Row merges.
Preferably, for horizontal a fairly large number of character type variable, it can merge level using kmeans algorithm, it can also
To use decision Tree algorithms merging horizontal.
S32: cleaning extracted data, obtains the neat model candidate variables of format.
Specifically, the meaning of situational variables, deletes variable and meaningless variable after borrowing;Situational variables null value rate is deleted and is lacked
The higher variable of mistake rate;Situational variables distribution situation deletes single level variable.
The analysis of null value rate and horizontal quantitative analysis are described in detail below:
The analysis of null value rate: as shown in table 1, calculating the overall null value rate and the null value rate in each application month of all variables,
Deleting the very high variable of overall null value rate and null value rate and application has the variable of obvious relation month.For classification type variable, delete
Except miss rate is more than 50% variable;For continuous variable, the variable that miss rate is more than 30% is deleted.
Variable | Null value rate |
nets_bill_month_num | 0.2% |
base_fee_month1 | 0.3% |
location_type_freq | 6.2% |
recharge_month_num | 7.2% |
recharge_month_num_min | 7.2% |
Horizontal quantitative analysis: calculating the horizontal quantity of all character type variables, deletes the variable of single level and comprising very
It is difficult to the variable of interpretation level more.
S33: selecting cleaned candidate variables, retains the variable that interpretability is higher than the first default value.
Specifically, selecting cleaned candidate variables, retain the strong and stable variable of interpretability.Evaluation
Type single argument KS retains the higher variable of KS;Single argument IV is calculated, the higher variable of IV is retained;Related coefficient between calculating variable,
Remove multicollinearity;Variable is selected using LASSO machine learning method, retains the variable that interpretability is higher than default value.
In a particular embodiment, it is illustrated in conjunction with table 2-3:
Single argument IV and KS: as shown in table 2, the IV of all candidate variables and the KS of all numeric type variables is calculated, will be become
Amount is ranked up according to IV and KS.Variable IV is higher, and its predictive ability is stronger, deletes weak predictive power variable of the IV less than 0.01.
Table 2
Variable | IV | KS |
scorepettycashv1 | 0.0858 | 0.0781 |
scorelargecashv2 | 0.0811 | 0.0818 |
score_20 | 0.0746 | 0.1112 |
scorecreditbt | 0.0594 | 0.1022 |
Related coefficient between variable: as shown in table 3, the related coefficient between all candidate variables is calculated, it is big deletes related coefficient
In the lesser variable of the IV of given threshold.
Table 3
kb_max | duration | fen_max | nets_avg | nets_diff | |
kb_max | 1 | 0.003062 | 0.207013 | 0.080726 | 0.105188 |
duration | 0.003062 | 1 | 0.05663 | 0.00855 | 0.002891 |
fen_max | 0.207013 | 0.05663 | 1 | 0.070467 | 0.070851 |
nets_avg | 0.080726 | 0.00855 | 0.070467 | 1 | 0.368271 |
nets_diff | 0.105188 | 0.002891 | 0.070851 | 0.368271 | 1 |
LASSO selects variable: using LASSO machine learning method, it is stronger to select predictive ability from all candidate variables
Variable, in the present embodiment, select the variable stage in LASSO, LASSO method choice variable both can be used, also can be used
Minimum angular convolution returns method choice variable, and it will not go into details by the present invention.
Successive Regression: final variables selection is carried out using logistic regression and successive Regression, is removed in successive Regression manually not
Too significant variable.
Preferably, if there are strong correlations between two variables, it can both retain the higher variable of IV, can also retain
It is distributed the variable for being easier to explain on more stable variable or business.
S34: the evaluation criteria of computation model assesses model entirety predictive ability.
In a particular embodiment, it is scored using model test sample, obtains the scoring distribution of fine or not client, according to
The registration judgment models predictive ability of two distributions;Test sample is predicted using model, obtains KS value or AUC value, root
According to the value judgment models predictive ability.
S35: using the language of suitable production environment, scorecard model is deployed to production environment.
For example, using the language of suitable production environment, such as SQL or C, scorecard model is deployed to production environment.Make
It is scored with model new application user, is decided whether to ratify user's application according to appraisal result.
Credit-graded approach provided by the invention selects multichannel information, such as operator's letter in data acquisition phase
Breath, credit card information and debit card information etc. comprehensively consider the information of user's various aspects, can be enhanced model prediction effect and
Stability;Derivative field based on isochronous surface can extract the information for really having predictive value, can also increase model
Prediction effect;In data preparation stage, the higher variable of miss rate and the variable comprising being difficult to interpretation level are deleted, is enhanced
The stability of entire model.In the model development stage, the variable with multicollinearity is deleted, enhances the stabilization of model
Property.The variable for really having predictive ability is selected using machine learning methods such as LASSO, improves the predictive ability of final mask.
In addition, the present invention also provides a kind of credit scoring systems, incorporated by reference to shown in Fig. 1, comprising: data cleansing module is used
It is cleaned in the initial data of the credit card application client to collection, leaves out the variable for not meeting preset condition;Data classification
Module, for carrying out data branch mailbox to the character type variable of reservation;Model construction module, for constructing Rating Model, to new Shen
Please user score, according to appraisal result determine whether ratify user application.
In data branch mailbox, for character type variable, the horizontal quantity of all character type variables is calculated, by horizontal negligible amounts
Level and the horizontal immediate level of meaning merge;The horizontal quantity of all character type variables is calculated, if some
The horizontal quantity of variable is more, then multi-state variable is merged into few state variable using kmeans clustering algorithm, wherein each
The quantity and bad client's accounting of the horizontal position level indicates;For numeric type variable, also referred to as supervised using optimal segmentation
Discretization is superintended and directed, i.e., is divided using recurrence and continuous variable is divided into segmentation, is a kind of behind based on inferred conditions lookup preferably grouping
Algorithm.
According to the branch mailbox of variable as a result, original variable is converted into WOE.WOE conversion can be by logistic regression model
It is transformed into scale card format.
Preferably, in conjunction with Fig. 2, the model construction module includes: extraction module, determines and extracts needed for Rating Model
Data type, in a preferred embodiment, the table structure and variable meaning of detailed analytical information data are derivative according to initial data
Potential significant variable, such as the derivative based on isochronous surface.Data are extracted from selected data classification, it will according to unique key
All data merge.For horizontal a fairly large number of character type variable, it can merge level using kmeans algorithm,
It is horizontal that decision Tree algorithms merging can be used.Data cleansing module is extracted, extracted data are cleaned, it is whole to obtain format
Neat model candidate variables;Data selecting module selects cleaned candidate variables, retains interpretability and is higher than first
The variable of default value;Evaluation module, the evaluation criteria of computation model assess model entirety predictive ability;Dispose mould
Scorecard model is deployed to production environment using the language of suitable production environment by block.
In addition, in the present embodiment, as shown in figure 3, the extraction data cleansing module includes: variable meaning analysis mould
Block, the meaning of situational variables delete variable and meaningless variable after borrowing;Variable null value rate analysis module, situational variables null value rate,
Delete the variable that miss rate is higher than the second default value;Variable distribution situation analysis module, situational variables distribution situation are deleted single
One level variable.
Further, as shown in figure 4, the data selecting module includes: KS computing module, evaluation type single argument
KS retains the higher variable of KS;IV computing module calculates single argument IV, retains the higher variable of IV;Related coefficient calculates mould
Block calculates related coefficient between variable, removes multicollinearity;Variables choice module selects to become using LASSO machine learning method
Amount retains the variable that interpretability is higher than default value.
The analysis of null value rate and horizontal quantitative analysis are described in detail below:
The analysis of null value rate: as shown in table 1, calculating the overall null value rate and the null value rate in each application month of all variables,
Deleting the very high variable of overall null value rate and null value rate and application has the variable of obvious relation month.For classification type variable, delete
Except miss rate is more than 50% variable;For continuous variable, the variable that miss rate is more than 30% is deleted.
Horizontal quantitative analysis: calculating the horizontal quantity of all character type variables, deletes the variable of single level and comprising very
It is difficult to the variable of interpretation level more.
In a particular embodiment, it is illustrated in conjunction with table 2-3:
Single argument IV and KS: as shown in table 2, the IV of all candidate variables and the KS of all numeric type variables is calculated, will be become
Amount is ranked up according to IV and KS.Variable IV is higher, and its predictive ability is stronger, deletes weak predictive power variable of the IV less than 0.01.
Related coefficient between variable: as shown in table 3, the related coefficient between all candidate variables is calculated, it is big deletes related coefficient
In the lesser variable of the IV of given threshold.
LASSO selects variable: using LASSO machine learning method, it is stronger to select predictive ability from all candidate variables
Variable, in the present embodiment, select the variable stage in LASSO, LASSO method choice variable both can be used, also can be used
Minimum angular convolution returns method choice variable, and it will not go into details by the present invention.
Successive Regression: final variables selection is carried out using logistic regression and successive Regression, is removed in successive Regression manually not
Too significant variable.
Preferably, if there are strong correlations between two variables, it can both retain the higher variable of IV, can also retain
It is distributed the variable for being easier to explain on more stable variable or business.
In a preferred embodiment, the deployment module is configured with model and scores test sample, obtains
The scoring of fine or not client is distributed, according to the registration judgment models predictive ability of two distributions;Test sample is carried out using model
Prediction, obtains KS value or AUC value, according to the value judgment models predictive ability.
Credit scoring system provided by the invention selects multichannel information, such as operator's letter in data acquisition phase
Breath, credit card information and debit card information etc. comprehensively consider the information of user's various aspects, can be enhanced model prediction effect and
Stability;Derivative field based on isochronous surface can extract the information for really having predictive value, can also increase model
Prediction effect;In data preparation stage, the higher variable of miss rate and the variable comprising being difficult to interpretation level are deleted, is enhanced
The stability of entire model.In the model development stage, the variable with multicollinearity is deleted, enhances the stabilization of model
Property.The variable for really having predictive ability is selected using machine learning methods such as LASSO, improves the predictive ability of final mask.
Further, some specific embodiments of the present invention provide a kind of computer equipment, including memory, processor with
And the computer program that can be run on a memory and on a processor is stored, the processor is realized such as when executing described program
The upper method executed by terminal.
Below with reference to Fig. 5, it illustrates the computer equipments 500 for the terminal device for being suitable for being used to realize the embodiment of the present application
Structural schematic diagram.
As shown in figure 5, computer equipment 500 includes central processing unit (CPU) 501, it can be read-only according to being stored in
Program in memory (ROM) 502 is loaded into random access storage device (RAM) from storage section 508) program in 503
And execute various work appropriate and processing.In RAM503, also it is stored with system 500 and operates required various program sum numbers
According to.CPU501, ROM502 and RAM503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to
Bus 504.
I/O interface 505 is connected to lower component: the importation 506 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 508 including hard disk etc.;
And including such as LAN card, the communications portion 509 of the network interface card of modem etc..Communications portion 509 via such as because
The network of spy's net executes communication process.Driver 510 is also connected to I/O interface 506 as needed.Detachable media 511, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 510, in order to read from thereon
Computer program be mounted as needed such as storage section 508.
Particularly, according to an embodiment of the invention, may be implemented as computer above with reference to the process of flow chart description
Software program.For example, the embodiment of the present invention includes a kind of computer program product comprising be tangibly embodied in machine readable
Computer program on medium, the computer program include the program code for method shown in execution flow chart.At this
In the embodiment of sample, which can be downloaded and installed from network by communications portion 509, and/or from removable
Medium 511 is unloaded to be mounted.
Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also be sent in a different order than that indicated in the drawings.Such as two boxes succeedingly indicated are actually
It can be basically executed in parallel, they can also execute in the opposite order sometimes, and this depends on the function involved.Also it to infuse
Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction
Combination realize.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair
The restriction of embodiments of the present invention may be used also on the basis of the above description for those of ordinary skill in the art
To make other variations or changes in different ways, all embodiments can not be exhaustive here, it is all to belong to this hair
The obvious changes or variations that bright technical solution is extended out are still in the scope of protection of the present invention.
Claims (12)
1. a kind of credit-graded approach, which is characterized in that the described method includes:
The initial data of the credit card application client of collection is cleaned, the variable for not meeting preset condition is left out;
Data branch mailbox is carried out to the character type variable of reservation;
Rating Model is constructed, is scored new application user, is determined whether to ratify user's application according to appraisal result.
2. method according to claim 1, which is characterized in that the building Rating Model scores to new application user,
Decide whether that ratifying user's application includes: according to appraisal result
Data type needed for determining and extracting Rating Model;
Extracted data are cleaned, the neat model candidate variables of format are obtained;
Cleaned candidate variables are selected, the variable that interpretability is higher than the first default value is retained;
The evaluation criteria of computation model assesses model entirety predictive ability;
Using the language of suitable production environment, scorecard model is deployed to production environment.
3. method according to claim 2, which is characterized in that it is described that extracted data are cleaned, it is whole to obtain format
Neat model candidate variables include:
The meaning of situational variables deletes variable and meaningless variable after borrowing;
Situational variables null value rate deletes the variable that miss rate is higher than the second default value;
Situational variables distribution situation deletes single level variable.
4. method according to claim 2, which is characterized in that it is described to cleaned candidate variables carry out selection include:
Evaluation type single argument KS retains the higher variable of KS;
Single argument IV is calculated, the higher variable of IV is retained;
Related coefficient between calculating variable, removes multicollinearity;
Variable is selected using LASSO machine learning method, retains the variable that interpretability is higher than default value.
5. method according to claim 2, which is characterized in that the evaluation criteria of the computation model integrally predicts model
Ability carries out assessment
It is scored using model test sample, obtains the scoring distribution of fine or not client, judged according to the registration of two distributions
Model prediction ability;
Test sample is predicted using model, obtains KS value or AUC value, according to the value judgment models predictive ability.
6. a kind of credit scoring system, comprising:
Data cleansing module, the initial data for the credit card application client to collection are cleaned, leave out do not meet it is default
The variable of condition;
Data categorization module, for carrying out data branch mailbox to the character type variable of reservation;
Model construction module scores to new application user, determines whether to criticize according to appraisal result for constructing Rating Model
Mutatis mutandis family application.
7. system according to claim 6, which is characterized in that the model construction module includes:
Extraction module, data type needed for determining and extracting Rating Model;
Data cleansing module is extracted, extracted data are cleaned, the neat model candidate variables of format are obtained;
Data selecting module selects cleaned candidate variables, retains the change that interpretability is higher than the first default value
Amount;
Evaluation module, the evaluation criteria of computation model assess model entirety predictive ability;
Scorecard model is deployed to production environment using the language of suitable production environment by deployment module.
8. system according to claim 7, which is characterized in that the extraction data cleansing module includes:
Variable meaning analysis module, the meaning of situational variables delete variable and meaningless variable after borrowing;
Variable null value rate analysis module, situational variables null value rate delete the variable that miss rate is higher than the second default value;
Variable distribution situation analysis module, situational variables distribution situation delete single level variable.
9. system according to claim 7, which is characterized in that the data selecting module includes:
KS computing module, evaluation type single argument KS retain the higher variable of KS;
IV computing module calculates single argument IV, retains the higher variable of IV;
Related coefficient computing module calculates related coefficient between variable, removes multicollinearity;
Variables choice module selects variable using LASSO machine learning method, retains the change that interpretability is higher than default value
Amount.
10. system according to claim 6, which is characterized in that the deployment module is configured as
It is scored using model test sample, obtains the scoring distribution of fine or not client, judged according to the registration of two distributions
Model prediction ability;
Test sample is predicted using model, obtains KS value or AUC value, according to the value judgment models predictive ability.
11. a kind of computer equipment, can run on a memory and on a processor including memory, processor and storage
Computer program, which is characterized in that
The processor is realized when executing described program such as any one of claim 1-5 the method.
12. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that the program is executed by processor
Any one of Shi Shixian such as claim 1-5 the method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810947751.2A CN109087196A (en) | 2018-08-20 | 2018-08-20 | Credit-graded approach, system, computer equipment and readable medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810947751.2A CN109087196A (en) | 2018-08-20 | 2018-08-20 | Credit-graded approach, system, computer equipment and readable medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109087196A true CN109087196A (en) | 2018-12-25 |
Family
ID=64794071
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810947751.2A Pending CN109087196A (en) | 2018-08-20 | 2018-08-20 | Credit-graded approach, system, computer equipment and readable medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109087196A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135467A (en) * | 2019-04-23 | 2019-08-16 | 北京淇瑀信息科技有限公司 | A kind of model training method, device, system and recording medium based on data splicing |
CN110196797A (en) * | 2019-06-06 | 2019-09-03 | 苏宁消费金融有限公司 | Automatic optimization method and system suitable for credit scoring card system |
CN110335134A (en) * | 2019-04-15 | 2019-10-15 | 梵界信息技术(上海)股份有限公司 | A method of it is converted based on WOE and realizes the classification of credit customer qualification |
CN110659817A (en) * | 2019-09-16 | 2020-01-07 | 上海云从企业发展有限公司 | Data processing method and device, machine readable medium and equipment |
CN111861704A (en) * | 2020-07-10 | 2020-10-30 | 深圳无域科技技术有限公司 | Wind control feature generation method and system |
CN112529477A (en) * | 2020-12-29 | 2021-03-19 | 平安普惠企业管理有限公司 | Credit evaluation variable screening method, device, computer equipment and storage medium |
CN112862593A (en) * | 2021-01-28 | 2021-05-28 | 深圳前海微众银行股份有限公司 | Credit scoring card model training method, device, system and computer storage medium |
CN112906723A (en) * | 2019-11-19 | 2021-06-04 | 北京京邦达贸易有限公司 | Feature selection method and device |
CN113222632A (en) * | 2020-02-04 | 2021-08-06 | 北京京东振世信息技术有限公司 | Object mining method and device |
CN116012143A (en) * | 2023-01-03 | 2023-04-25 | 睿智合创(北京)科技有限公司 | Variable selection and parameter estimation method under case-division regression |
CN114595244B (en) * | 2022-03-11 | 2023-10-17 | 抖音视界有限公司 | Method and device for aggregating crash data, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106408411A (en) * | 2016-08-31 | 2017-02-15 | 北京城市网邻信息技术有限公司 | Credit assessment method and device |
CN106897918A (en) * | 2017-02-24 | 2017-06-27 | 上海易贷网金融信息服务有限公司 | A kind of hybrid machine learning credit scoring model construction method |
CN108022146A (en) * | 2017-11-14 | 2018-05-11 | 深圳市牛鼎丰科技有限公司 | Characteristic item processing method, device, the computer equipment of collage-credit data |
-
2018
- 2018-08-20 CN CN201810947751.2A patent/CN109087196A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106408411A (en) * | 2016-08-31 | 2017-02-15 | 北京城市网邻信息技术有限公司 | Credit assessment method and device |
CN106897918A (en) * | 2017-02-24 | 2017-06-27 | 上海易贷网金融信息服务有限公司 | A kind of hybrid machine learning credit scoring model construction method |
CN108022146A (en) * | 2017-11-14 | 2018-05-11 | 深圳市牛鼎丰科技有限公司 | Characteristic item processing method, device, the computer equipment of collage-credit data |
Non-Patent Citations (2)
Title |
---|
张万军: "《基于大数据的个人信用风险评估模型研究》", 《中国博士学位论文全文数据库经济与管理科学辑》 * |
肖明: "《国外图书情报知识图谱实证研究》", 31 March 2018 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110335134A (en) * | 2019-04-15 | 2019-10-15 | 梵界信息技术(上海)股份有限公司 | A method of it is converted based on WOE and realizes the classification of credit customer qualification |
CN110135467A (en) * | 2019-04-23 | 2019-08-16 | 北京淇瑀信息科技有限公司 | A kind of model training method, device, system and recording medium based on data splicing |
CN110196797B (en) * | 2019-06-06 | 2022-08-02 | 苏宁消费金融有限公司 | Automatic optimization method and system suitable for credit scoring card system |
CN110196797A (en) * | 2019-06-06 | 2019-09-03 | 苏宁消费金融有限公司 | Automatic optimization method and system suitable for credit scoring card system |
CN110659817A (en) * | 2019-09-16 | 2020-01-07 | 上海云从企业发展有限公司 | Data processing method and device, machine readable medium and equipment |
CN112906723A (en) * | 2019-11-19 | 2021-06-04 | 北京京邦达贸易有限公司 | Feature selection method and device |
CN112906723B (en) * | 2019-11-19 | 2024-01-16 | 北京京邦达贸易有限公司 | Feature selection method and device |
CN113222632A (en) * | 2020-02-04 | 2021-08-06 | 北京京东振世信息技术有限公司 | Object mining method and device |
CN111861704A (en) * | 2020-07-10 | 2020-10-30 | 深圳无域科技技术有限公司 | Wind control feature generation method and system |
CN112529477A (en) * | 2020-12-29 | 2021-03-19 | 平安普惠企业管理有限公司 | Credit evaluation variable screening method, device, computer equipment and storage medium |
CN112862593A (en) * | 2021-01-28 | 2021-05-28 | 深圳前海微众银行股份有限公司 | Credit scoring card model training method, device, system and computer storage medium |
CN112862593B (en) * | 2021-01-28 | 2024-05-03 | 深圳前海微众银行股份有限公司 | Credit scoring card model training method, device and system and computer storage medium |
CN114595244B (en) * | 2022-03-11 | 2023-10-17 | 抖音视界有限公司 | Method and device for aggregating crash data, electronic equipment and storage medium |
CN116012143A (en) * | 2023-01-03 | 2023-04-25 | 睿智合创(北京)科技有限公司 | Variable selection and parameter estimation method under case-division regression |
CN116012143B (en) * | 2023-01-03 | 2023-10-13 | 睿智合创(北京)科技有限公司 | Variable selection and parameter estimation method under case-division regression |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109087196A (en) | Credit-graded approach, system, computer equipment and readable medium | |
CN107330445B (en) | User attribute prediction method and device | |
JP7090936B2 (en) | ESG-based corporate evaluation execution device and its operation method | |
Almana et al. | A survey on data mining techniques in customer churn analysis for telecom industry | |
CN110223168A (en) | A kind of anti-fraud detection method of label propagation and system based on business connection map | |
CN113935434A (en) | Data analysis processing system and automatic modeling method | |
CN110008259A (en) | The method and terminal device of visualized data analysis | |
CN106503863A (en) | Based on the Forecasting Methodology of the age characteristicss of decision-tree model, system and terminal | |
CN105574544A (en) | Data processing method and device | |
CN111199474A (en) | Risk prediction method and device based on network diagram data of two parties and electronic equipment | |
CN111325619A (en) | Credit card fraud detection model updating method and device based on joint learning | |
CN109495479A (en) | A kind of user's abnormal behaviour recognition methods and device | |
CN107145516A (en) | A kind of Text Clustering Method and system | |
CN110046889A (en) | A kind of detection method, device and the server of abnormal behaviour main body | |
CN110084627A (en) | The method and apparatus for predicting target variable | |
CN108228622A (en) | The sorting technique and device of traffic issues | |
CN113239268B (en) | Commodity recommendation method, device and system | |
CN107908616A (en) | The method and apparatus of anticipation trend word | |
CN108846695A (en) | The prediction technique and device of terminal replacement cycle | |
CN110727740B (en) | Correlation analysis method and device, computer equipment and readable medium | |
CN107679209B (en) | Classification expression generation method and device | |
CN112884569A (en) | Credit assessment model training method, device and equipment | |
CN109102396A (en) | A kind of user credit ranking method, computer equipment and readable medium | |
CN112016855A (en) | User industry identification method and device based on relational network matching and electronic equipment | |
CN106919997A (en) | A kind of customer consumption Forecasting Methodology of the ecommerce based on LDA |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181225 |