CN109087196A - Credit-graded approach, system, computer equipment and readable medium - Google Patents

Credit-graded approach, system, computer equipment and readable medium Download PDF

Info

Publication number
CN109087196A
CN109087196A CN201810947751.2A CN201810947751A CN109087196A CN 109087196 A CN109087196 A CN 109087196A CN 201810947751 A CN201810947751 A CN 201810947751A CN 109087196 A CN109087196 A CN 109087196A
Authority
CN
China
Prior art keywords
variable
model
data
module
variables
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810947751.2A
Other languages
Chinese (zh)
Inventor
肖尊雷
赵钢
庞闪闪
刘婷婷
康丽娜
李翠静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiufu Pratt & Whitney Information Technology Co Ltd
Original Assignee
Beijing Jiufu Pratt & Whitney Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiufu Pratt & Whitney Information Technology Co Ltd filed Critical Beijing Jiufu Pratt & Whitney Information Technology Co Ltd
Priority to CN201810947751.2A priority Critical patent/CN109087196A/en
Publication of CN109087196A publication Critical patent/CN109087196A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of credit-graded approach, system, computer equipment and readable medium, in data acquisition phase, select multichannel information, such as operator's informaiton, credit card information and debit card information etc., the prediction effect and stability of model can be enhanced in the information for comprehensively considering user's various aspects;Derivative field based on isochronous surface can extract the information for really having predictive value, can also increase the prediction effect of model;In data preparation stage, the higher variable of miss rate and the variable comprising being difficult to interpretation level are deleted, the stability of entire model is enhanced.In the model development stage, the variable with multicollinearity is deleted, enhances the stability of model.The variable for really having predictive ability is selected using machine learning methods such as LASSO, improves the predictive ability of final mask.

Description

Credit-graded approach, system, computer equipment and readable medium
Technical field
The present invention relates to credit scorings.More particularly, to credit-graded approach, system, computer equipment and readable Jie Matter.
Background technique
Credit rating is also known as " credit rating " or " prestige grading ", is the important content and base for establishing social credit system Plinth.Traditional credit rating method is mostly based on expert method or scorecard model, i.e., specifies a set of scoring previously according to banker experience Rule applies this rule and scores further according to the real data of user.Chinese invention patent, application number 201710197889.0, disclose a kind of the special of entitled loan user credit ranking method and system based on machine learning Benefit.The patent discloses that a kind of credit rating method, specifically includes that and be acquired to modeling sample data, the sign of trade company is obtained Letter report and whether overdue data;Reference data reporting is pre-processed, including data are extracted and index subdivision, are predicted Variable and its weight;Sample data is modeled using a kind of machine learning method, obtains prediction model;Use prediction model New loan user is predicted, the Default Probability of new user is obtained;It is scored using new user's Default Probability it, is obtained The credit scoring of new user.The present invention establishes model using a kind of machine learning algorithm, allows model according to completely new user Data carry out iteratively faster, can be widely applied to computer field.But the patent still has several drawbacks, mainly finally The selection of predictive variable.The patent analyzes credit line, recent behavior, the credit duration, account extracted from reference report 143 predictive variables of five dimensions such as quantity and refund history.In order to reduce operand and promote predetermined speed, from 143 7 final mask predictive variables are filtered out in predictive variable, comprising: credit card is used and is averaged the accrediting amount, nearest one The credit card of secondary refund is borrowed away from modern time, nearest 24 months inquiry times, the last credit card away from the now time, earliest Note card is away from the now time, nearest 3 months inquiry times, nearest six months inquiry times.This article does not explicitly point out its use Variable Selection, the variable selected without well cover five dimensions.It can be seen that this 7 predictions from variable meaning There are certain multicollinearity between variable, there may be unstability for the model established accordingly.Therefore, at present for tradition There is still a need for improve the method for credit rating.
Summary of the invention
In view of this, in order to solve to need to carry out for there are still some defects in traditional credit rating method at present The problem of improvement, the present invention adopt the following technical solutions:
First aspect present invention provides a kind of credit-graded approach, which is characterized in that the described method includes:
The initial data of the credit card application client of collection is cleaned, the variable for not meeting preset condition is left out;
Data branch mailbox is carried out to the character type variable of reservation;
Rating Model is constructed, is scored new application user, is determined whether to ratify user's application according to appraisal result.
Preferably, the building Rating Model scores to new application user, decides whether to ratify according to appraisal result User applies
Data type needed for determining and extracting Rating Model;
Extracted data are cleaned, the neat model candidate variables of format are obtained;
Cleaned candidate variables are selected, the variable that interpretability is higher than the first default value is retained;
The evaluation criteria of computation model assesses model entirety predictive ability;
Using the language of suitable production environment, scorecard model is deployed to production environment.
Preferably, described to clean to extracted data, obtaining the neat model candidate variables of format includes:
The meaning of situational variables deletes variable and meaningless variable after borrowing;
Situational variables null value rate deletes the variable that miss rate is higher than the second default value;
Situational variables distribution situation deletes single level variable.
Preferably, it is described to cleaned candidate variables carry out selection include:
Evaluation type single argument KS retains the higher variable of KS;
Single argument IV is calculated, the higher variable of IV is retained;
Related coefficient between calculating variable, removes multicollinearity;
Variable is selected using LASSO machine learning method, retains the variable that interpretability is higher than default value.
Preferably, the evaluation criteria of the computation model, carrying out assessment to model entirety predictive ability includes:
It is scored using model test sample, the scoring distribution of fine or not client is obtained, according to the registration of two distributions Judgment models predictive ability;
Test sample is predicted using model, obtains KS value or AUC value, according to the value judgment models predictive ability.
Second aspect of the present invention provides a kind of credit scoring system, comprising:
Data cleansing module, the initial data for the credit card application client to collection clean, leave out and do not meet The variable of preset condition;
Data categorization module, for carrying out data branch mailbox to the character type variable of reservation;
Model construction module scores to new application user, is according to appraisal result determination for constructing Rating Model No approval user application.
Preferably, the model construction module includes:
Extraction module, data type needed for determining and extracting Rating Model;
Data cleansing module is extracted, extracted data are cleaned, the neat model candidate variables of format are obtained;
Data selecting module selects cleaned candidate variables, retains interpretability and is higher than the first default value Variable;
Evaluation module, the evaluation criteria of computation model assess model entirety predictive ability;
Scorecard model is deployed to production environment using the language of suitable production environment by deployment module.
Preferably, the extraction data cleansing module includes:
Variable meaning analysis module, the meaning of situational variables delete variable and meaningless variable after borrowing;
Variable null value rate analysis module, situational variables null value rate delete the variable that miss rate is higher than the second default value;
Variable distribution situation analysis module, situational variables distribution situation delete single level variable.
Preferably, the data selecting module includes:
KS computing module, evaluation type single argument KS retain the higher variable of KS;
IV computing module calculates single argument IV, retains the higher variable of IV;
Related coefficient computing module calculates related coefficient between variable, removes multicollinearity;
Variables choice module selects variable using LASSO machine learning method, retains interpretability and is higher than default value Variable.
Preferably, the deployment module is configured as
It is scored using model test sample, the scoring distribution of fine or not client is obtained, according to the registration of two distributions Judgment models predictive ability;
Test sample is predicted using model, obtains KS value or AUC value, according to the value judgment models predictive ability.
Third aspect present invention provides a kind of computer equipment, including memory, processor and storage are on a memory And the computer program that can be run on a processor,
The processor realizes method as described above when executing described program.
Fourth aspect present invention provides a kind of computer-readable medium, is stored thereon with computer program, which is located Reason device realizes method as described above when executing.
Beneficial effects of the present invention are as follows:
The present invention provides a kind of credit-graded approach, system, computer equipment and readable medium, in data acquisition phase, Multichannel information, such as operator's informaiton, credit card information and debit card information etc. is selected to comprehensively consider the letter of user's various aspects Breath, can be enhanced the prediction effect and stability of model;Derivative field based on isochronous surface, can extract really has prediction The information of value can also increase the prediction effect of model;In data preparation stage, the higher variable of miss rate and packet are deleted Containing the variable for being difficult to interpretation level, the stability of entire model is enhanced.In the model development stage, delete with multiple conllinear The variable of property, enhances the stability of model.The variable for really having predictive ability is selected using machine learning methods such as LASSO, Improve the predictive ability of final mask.
Detailed description of the invention
Specific embodiments of the present invention will be described in further detail with reference to the accompanying drawing.
Fig. 1 shows credit scoring system structural schematic diagram in the embodiment of the present invention.
Fig. 2 shows the structural schematic diagrams of model construction module in Fig. 1.
Fig. 3 shows the structural schematic diagram that data cleansing module is extracted in Fig. 2.
Fig. 4 shows the structural schematic diagram of data selecting module in Fig. 2.
Fig. 5 shows the structural schematic diagram for being suitable for the computer equipment for the terminal device for being used to realize the embodiment of the present application
Specific embodiment
In order to illustrate more clearly of the present invention, the present invention is done further below with reference to preferred embodiments and drawings It is bright.Similar component is indicated in attached drawing with identical appended drawing reference.It will be appreciated by those skilled in the art that institute is specific below The content of description is illustrative and be not restrictive, and should not be limited the scope of the invention with this.
The present invention provides a kind of credit-graded approach, specifically includes:
S1: the initial data of the credit card application client of collection is cleaned, the variable for not meeting preset condition is left out.
S2: data branch mailbox is carried out to the character type variable of reservation.
Specifically, the horizontal quantity of all character type variables is calculated for character type variable, by the water of horizontal negligible amounts The gentle horizontal immediate level of meaning merges;The horizontal quantity of all character type variables is calculated, if some variable Horizontal quantity it is more, then multi-state variable is merged by few state variable using kmeans clustering algorithm, wherein each level The position level quantity and bad client's accounting indicate;For numeric type variable, also referred to as supervised using optimal segmentation from Dispersion is divided using recurrence continuous variable being divided into segmentation, be a kind of calculation searched based on inferred conditions and be preferably grouped behind Method.
According to the branch mailbox of variable as a result, original variable is converted into WOE.WOE conversion can be by logistic regression model It is transformed into scale card format.
S3: building Rating Model scores to new application user, is determined whether to ratify user Shen according to appraisal result Please.
Specifically, the building Rating Model, scores to new application user, decides whether to ratify according to appraisal result User applies
S31: data type needed for determining and extracting Rating Model;
Specifically, the table structure and variable meaning of detailed analytical information data, potential significant according to initial data derivative Variable, such as the derivative based on isochronous surface.Extract data from selected data classification, according to unique key by all data into Row merges.
Preferably, for horizontal a fairly large number of character type variable, it can merge level using kmeans algorithm, it can also To use decision Tree algorithms merging horizontal.
S32: cleaning extracted data, obtains the neat model candidate variables of format.
Specifically, the meaning of situational variables, deletes variable and meaningless variable after borrowing;Situational variables null value rate is deleted and is lacked The higher variable of mistake rate;Situational variables distribution situation deletes single level variable.
The analysis of null value rate and horizontal quantitative analysis are described in detail below:
The analysis of null value rate: as shown in table 1, calculating the overall null value rate and the null value rate in each application month of all variables, Deleting the very high variable of overall null value rate and null value rate and application has the variable of obvious relation month.For classification type variable, delete Except miss rate is more than 50% variable;For continuous variable, the variable that miss rate is more than 30% is deleted.
Variable Null value rate
nets_bill_month_num 0.2%
base_fee_month1 0.3%
location_type_freq 6.2%
recharge_month_num 7.2%
recharge_month_num_min 7.2%
Horizontal quantitative analysis: calculating the horizontal quantity of all character type variables, deletes the variable of single level and comprising very It is difficult to the variable of interpretation level more.
S33: selecting cleaned candidate variables, retains the variable that interpretability is higher than the first default value.
Specifically, selecting cleaned candidate variables, retain the strong and stable variable of interpretability.Evaluation Type single argument KS retains the higher variable of KS;Single argument IV is calculated, the higher variable of IV is retained;Related coefficient between calculating variable, Remove multicollinearity;Variable is selected using LASSO machine learning method, retains the variable that interpretability is higher than default value.
In a particular embodiment, it is illustrated in conjunction with table 2-3:
Single argument IV and KS: as shown in table 2, the IV of all candidate variables and the KS of all numeric type variables is calculated, will be become Amount is ranked up according to IV and KS.Variable IV is higher, and its predictive ability is stronger, deletes weak predictive power variable of the IV less than 0.01.
Table 2
Variable IV KS
scorepettycashv1 0.0858 0.0781
scorelargecashv2 0.0811 0.0818
score_20 0.0746 0.1112
scorecreditbt 0.0594 0.1022
Related coefficient between variable: as shown in table 3, the related coefficient between all candidate variables is calculated, it is big deletes related coefficient In the lesser variable of the IV of given threshold.
Table 3
kb_max duration fen_max nets_avg nets_diff
kb_max 1 0.003062 0.207013 0.080726 0.105188
duration 0.003062 1 0.05663 0.00855 0.002891
fen_max 0.207013 0.05663 1 0.070467 0.070851
nets_avg 0.080726 0.00855 0.070467 1 0.368271
nets_diff 0.105188 0.002891 0.070851 0.368271 1
LASSO selects variable: using LASSO machine learning method, it is stronger to select predictive ability from all candidate variables Variable, in the present embodiment, select the variable stage in LASSO, LASSO method choice variable both can be used, also can be used Minimum angular convolution returns method choice variable, and it will not go into details by the present invention.
Successive Regression: final variables selection is carried out using logistic regression and successive Regression, is removed in successive Regression manually not Too significant variable.
Preferably, if there are strong correlations between two variables, it can both retain the higher variable of IV, can also retain It is distributed the variable for being easier to explain on more stable variable or business.
S34: the evaluation criteria of computation model assesses model entirety predictive ability.
In a particular embodiment, it is scored using model test sample, obtains the scoring distribution of fine or not client, according to The registration judgment models predictive ability of two distributions;Test sample is predicted using model, obtains KS value or AUC value, root According to the value judgment models predictive ability.
S35: using the language of suitable production environment, scorecard model is deployed to production environment.
For example, using the language of suitable production environment, such as SQL or C, scorecard model is deployed to production environment.Make It is scored with model new application user, is decided whether to ratify user's application according to appraisal result.
Credit-graded approach provided by the invention selects multichannel information, such as operator's letter in data acquisition phase Breath, credit card information and debit card information etc. comprehensively consider the information of user's various aspects, can be enhanced model prediction effect and Stability;Derivative field based on isochronous surface can extract the information for really having predictive value, can also increase model Prediction effect;In data preparation stage, the higher variable of miss rate and the variable comprising being difficult to interpretation level are deleted, is enhanced The stability of entire model.In the model development stage, the variable with multicollinearity is deleted, enhances the stabilization of model Property.The variable for really having predictive ability is selected using machine learning methods such as LASSO, improves the predictive ability of final mask.
In addition, the present invention also provides a kind of credit scoring systems, incorporated by reference to shown in Fig. 1, comprising: data cleansing module is used It is cleaned in the initial data of the credit card application client to collection, leaves out the variable for not meeting preset condition;Data classification Module, for carrying out data branch mailbox to the character type variable of reservation;Model construction module, for constructing Rating Model, to new Shen Please user score, according to appraisal result determine whether ratify user application.
In data branch mailbox, for character type variable, the horizontal quantity of all character type variables is calculated, by horizontal negligible amounts Level and the horizontal immediate level of meaning merge;The horizontal quantity of all character type variables is calculated, if some The horizontal quantity of variable is more, then multi-state variable is merged into few state variable using kmeans clustering algorithm, wherein each The quantity and bad client's accounting of the horizontal position level indicates;For numeric type variable, also referred to as supervised using optimal segmentation Discretization is superintended and directed, i.e., is divided using recurrence and continuous variable is divided into segmentation, is a kind of behind based on inferred conditions lookup preferably grouping Algorithm.
According to the branch mailbox of variable as a result, original variable is converted into WOE.WOE conversion can be by logistic regression model It is transformed into scale card format.
Preferably, in conjunction with Fig. 2, the model construction module includes: extraction module, determines and extracts needed for Rating Model Data type, in a preferred embodiment, the table structure and variable meaning of detailed analytical information data are derivative according to initial data Potential significant variable, such as the derivative based on isochronous surface.Data are extracted from selected data classification, it will according to unique key All data merge.For horizontal a fairly large number of character type variable, it can merge level using kmeans algorithm, It is horizontal that decision Tree algorithms merging can be used.Data cleansing module is extracted, extracted data are cleaned, it is whole to obtain format Neat model candidate variables;Data selecting module selects cleaned candidate variables, retains interpretability and is higher than first The variable of default value;Evaluation module, the evaluation criteria of computation model assess model entirety predictive ability;Dispose mould Scorecard model is deployed to production environment using the language of suitable production environment by block.
In addition, in the present embodiment, as shown in figure 3, the extraction data cleansing module includes: variable meaning analysis mould Block, the meaning of situational variables delete variable and meaningless variable after borrowing;Variable null value rate analysis module, situational variables null value rate, Delete the variable that miss rate is higher than the second default value;Variable distribution situation analysis module, situational variables distribution situation are deleted single One level variable.
Further, as shown in figure 4, the data selecting module includes: KS computing module, evaluation type single argument KS retains the higher variable of KS;IV computing module calculates single argument IV, retains the higher variable of IV;Related coefficient calculates mould Block calculates related coefficient between variable, removes multicollinearity;Variables choice module selects to become using LASSO machine learning method Amount retains the variable that interpretability is higher than default value.
The analysis of null value rate and horizontal quantitative analysis are described in detail below:
The analysis of null value rate: as shown in table 1, calculating the overall null value rate and the null value rate in each application month of all variables, Deleting the very high variable of overall null value rate and null value rate and application has the variable of obvious relation month.For classification type variable, delete Except miss rate is more than 50% variable;For continuous variable, the variable that miss rate is more than 30% is deleted.
Horizontal quantitative analysis: calculating the horizontal quantity of all character type variables, deletes the variable of single level and comprising very It is difficult to the variable of interpretation level more.
In a particular embodiment, it is illustrated in conjunction with table 2-3:
Single argument IV and KS: as shown in table 2, the IV of all candidate variables and the KS of all numeric type variables is calculated, will be become Amount is ranked up according to IV and KS.Variable IV is higher, and its predictive ability is stronger, deletes weak predictive power variable of the IV less than 0.01.
Related coefficient between variable: as shown in table 3, the related coefficient between all candidate variables is calculated, it is big deletes related coefficient In the lesser variable of the IV of given threshold.
LASSO selects variable: using LASSO machine learning method, it is stronger to select predictive ability from all candidate variables Variable, in the present embodiment, select the variable stage in LASSO, LASSO method choice variable both can be used, also can be used Minimum angular convolution returns method choice variable, and it will not go into details by the present invention.
Successive Regression: final variables selection is carried out using logistic regression and successive Regression, is removed in successive Regression manually not Too significant variable.
Preferably, if there are strong correlations between two variables, it can both retain the higher variable of IV, can also retain It is distributed the variable for being easier to explain on more stable variable or business.
In a preferred embodiment, the deployment module is configured with model and scores test sample, obtains The scoring of fine or not client is distributed, according to the registration judgment models predictive ability of two distributions;Test sample is carried out using model Prediction, obtains KS value or AUC value, according to the value judgment models predictive ability.
Credit scoring system provided by the invention selects multichannel information, such as operator's letter in data acquisition phase Breath, credit card information and debit card information etc. comprehensively consider the information of user's various aspects, can be enhanced model prediction effect and Stability;Derivative field based on isochronous surface can extract the information for really having predictive value, can also increase model Prediction effect;In data preparation stage, the higher variable of miss rate and the variable comprising being difficult to interpretation level are deleted, is enhanced The stability of entire model.In the model development stage, the variable with multicollinearity is deleted, enhances the stabilization of model Property.The variable for really having predictive ability is selected using machine learning methods such as LASSO, improves the predictive ability of final mask.
Further, some specific embodiments of the present invention provide a kind of computer equipment, including memory, processor with And the computer program that can be run on a memory and on a processor is stored, the processor is realized such as when executing described program The upper method executed by terminal.
Below with reference to Fig. 5, it illustrates the computer equipments 500 for the terminal device for being suitable for being used to realize the embodiment of the present application Structural schematic diagram.
As shown in figure 5, computer equipment 500 includes central processing unit (CPU) 501, it can be read-only according to being stored in Program in memory (ROM) 502 is loaded into random access storage device (RAM) from storage section 508) program in 503 And execute various work appropriate and processing.In RAM503, also it is stored with system 500 and operates required various program sum numbers According to.CPU501, ROM502 and RAM503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to Bus 504.
I/O interface 505 is connected to lower component: the importation 506 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 508 including hard disk etc.; And including such as LAN card, the communications portion 509 of the network interface card of modem etc..Communications portion 509 via such as because The network of spy's net executes communication process.Driver 510 is also connected to I/O interface 506 as needed.Detachable media 511, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 510, in order to read from thereon Computer program be mounted as needed such as storage section 508.
Particularly, according to an embodiment of the invention, may be implemented as computer above with reference to the process of flow chart description Software program.For example, the embodiment of the present invention includes a kind of computer program product comprising be tangibly embodied in machine readable Computer program on medium, the computer program include the program code for method shown in execution flow chart.At this In the embodiment of sample, which can be downloaded and installed from network by communications portion 509, and/or from removable Medium 511 is unloaded to be mounted.
Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also be sent in a different order than that indicated in the drawings.Such as two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also execute in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention may be used also on the basis of the above description for those of ordinary skill in the art To make other variations or changes in different ways, all embodiments can not be exhaustive here, it is all to belong to this hair The obvious changes or variations that bright technical solution is extended out are still in the scope of protection of the present invention.

Claims (12)

1. a kind of credit-graded approach, which is characterized in that the described method includes:
The initial data of the credit card application client of collection is cleaned, the variable for not meeting preset condition is left out;
Data branch mailbox is carried out to the character type variable of reservation;
Rating Model is constructed, is scored new application user, is determined whether to ratify user's application according to appraisal result.
2. method according to claim 1, which is characterized in that the building Rating Model scores to new application user, Decide whether that ratifying user's application includes: according to appraisal result
Data type needed for determining and extracting Rating Model;
Extracted data are cleaned, the neat model candidate variables of format are obtained;
Cleaned candidate variables are selected, the variable that interpretability is higher than the first default value is retained;
The evaluation criteria of computation model assesses model entirety predictive ability;
Using the language of suitable production environment, scorecard model is deployed to production environment.
3. method according to claim 2, which is characterized in that it is described that extracted data are cleaned, it is whole to obtain format Neat model candidate variables include:
The meaning of situational variables deletes variable and meaningless variable after borrowing;
Situational variables null value rate deletes the variable that miss rate is higher than the second default value;
Situational variables distribution situation deletes single level variable.
4. method according to claim 2, which is characterized in that it is described to cleaned candidate variables carry out selection include:
Evaluation type single argument KS retains the higher variable of KS;
Single argument IV is calculated, the higher variable of IV is retained;
Related coefficient between calculating variable, removes multicollinearity;
Variable is selected using LASSO machine learning method, retains the variable that interpretability is higher than default value.
5. method according to claim 2, which is characterized in that the evaluation criteria of the computation model integrally predicts model Ability carries out assessment
It is scored using model test sample, obtains the scoring distribution of fine or not client, judged according to the registration of two distributions Model prediction ability;
Test sample is predicted using model, obtains KS value or AUC value, according to the value judgment models predictive ability.
6. a kind of credit scoring system, comprising:
Data cleansing module, the initial data for the credit card application client to collection are cleaned, leave out do not meet it is default The variable of condition;
Data categorization module, for carrying out data branch mailbox to the character type variable of reservation;
Model construction module scores to new application user, determines whether to criticize according to appraisal result for constructing Rating Model Mutatis mutandis family application.
7. system according to claim 6, which is characterized in that the model construction module includes:
Extraction module, data type needed for determining and extracting Rating Model;
Data cleansing module is extracted, extracted data are cleaned, the neat model candidate variables of format are obtained;
Data selecting module selects cleaned candidate variables, retains the change that interpretability is higher than the first default value Amount;
Evaluation module, the evaluation criteria of computation model assess model entirety predictive ability;
Scorecard model is deployed to production environment using the language of suitable production environment by deployment module.
8. system according to claim 7, which is characterized in that the extraction data cleansing module includes:
Variable meaning analysis module, the meaning of situational variables delete variable and meaningless variable after borrowing;
Variable null value rate analysis module, situational variables null value rate delete the variable that miss rate is higher than the second default value;
Variable distribution situation analysis module, situational variables distribution situation delete single level variable.
9. system according to claim 7, which is characterized in that the data selecting module includes:
KS computing module, evaluation type single argument KS retain the higher variable of KS;
IV computing module calculates single argument IV, retains the higher variable of IV;
Related coefficient computing module calculates related coefficient between variable, removes multicollinearity;
Variables choice module selects variable using LASSO machine learning method, retains the change that interpretability is higher than default value Amount.
10. system according to claim 6, which is characterized in that the deployment module is configured as
It is scored using model test sample, obtains the scoring distribution of fine or not client, judged according to the registration of two distributions Model prediction ability;
Test sample is predicted using model, obtains KS value or AUC value, according to the value judgment models predictive ability.
11. a kind of computer equipment, can run on a memory and on a processor including memory, processor and storage Computer program, which is characterized in that
The processor is realized when executing described program such as any one of claim 1-5 the method.
12. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that the program is executed by processor Any one of Shi Shixian such as claim 1-5 the method.
CN201810947751.2A 2018-08-20 2018-08-20 Credit-graded approach, system, computer equipment and readable medium Pending CN109087196A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810947751.2A CN109087196A (en) 2018-08-20 2018-08-20 Credit-graded approach, system, computer equipment and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810947751.2A CN109087196A (en) 2018-08-20 2018-08-20 Credit-graded approach, system, computer equipment and readable medium

Publications (1)

Publication Number Publication Date
CN109087196A true CN109087196A (en) 2018-12-25

Family

ID=64794071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810947751.2A Pending CN109087196A (en) 2018-08-20 2018-08-20 Credit-graded approach, system, computer equipment and readable medium

Country Status (1)

Country Link
CN (1) CN109087196A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135467A (en) * 2019-04-23 2019-08-16 北京淇瑀信息科技有限公司 A kind of model training method, device, system and recording medium based on data splicing
CN110196797A (en) * 2019-06-06 2019-09-03 苏宁消费金融有限公司 Automatic optimization method and system suitable for credit scoring card system
CN110335134A (en) * 2019-04-15 2019-10-15 梵界信息技术(上海)股份有限公司 A method of it is converted based on WOE and realizes the classification of credit customer qualification
CN110659817A (en) * 2019-09-16 2020-01-07 上海云从企业发展有限公司 Data processing method and device, machine readable medium and equipment
CN111861704A (en) * 2020-07-10 2020-10-30 深圳无域科技技术有限公司 Wind control feature generation method and system
CN112529477A (en) * 2020-12-29 2021-03-19 平安普惠企业管理有限公司 Credit evaluation variable screening method, device, computer equipment and storage medium
CN112862593A (en) * 2021-01-28 2021-05-28 深圳前海微众银行股份有限公司 Credit scoring card model training method, device, system and computer storage medium
CN112906723A (en) * 2019-11-19 2021-06-04 北京京邦达贸易有限公司 Feature selection method and device
CN113222632A (en) * 2020-02-04 2021-08-06 北京京东振世信息技术有限公司 Object mining method and device
CN116012143A (en) * 2023-01-03 2023-04-25 睿智合创(北京)科技有限公司 Variable selection and parameter estimation method under case-division regression
CN114595244B (en) * 2022-03-11 2023-10-17 抖音视界有限公司 Method and device for aggregating crash data, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106408411A (en) * 2016-08-31 2017-02-15 北京城市网邻信息技术有限公司 Credit assessment method and device
CN106897918A (en) * 2017-02-24 2017-06-27 上海易贷网金融信息服务有限公司 A kind of hybrid machine learning credit scoring model construction method
CN108022146A (en) * 2017-11-14 2018-05-11 深圳市牛鼎丰科技有限公司 Characteristic item processing method, device, the computer equipment of collage-credit data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106408411A (en) * 2016-08-31 2017-02-15 北京城市网邻信息技术有限公司 Credit assessment method and device
CN106897918A (en) * 2017-02-24 2017-06-27 上海易贷网金融信息服务有限公司 A kind of hybrid machine learning credit scoring model construction method
CN108022146A (en) * 2017-11-14 2018-05-11 深圳市牛鼎丰科技有限公司 Characteristic item processing method, device, the computer equipment of collage-credit data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张万军: "《基于大数据的个人信用风险评估模型研究》", 《中国博士学位论文全文数据库经济与管理科学辑》 *
肖明: "《国外图书情报知识图谱实证研究》", 31 March 2018 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110335134A (en) * 2019-04-15 2019-10-15 梵界信息技术(上海)股份有限公司 A method of it is converted based on WOE and realizes the classification of credit customer qualification
CN110135467A (en) * 2019-04-23 2019-08-16 北京淇瑀信息科技有限公司 A kind of model training method, device, system and recording medium based on data splicing
CN110196797B (en) * 2019-06-06 2022-08-02 苏宁消费金融有限公司 Automatic optimization method and system suitable for credit scoring card system
CN110196797A (en) * 2019-06-06 2019-09-03 苏宁消费金融有限公司 Automatic optimization method and system suitable for credit scoring card system
CN110659817A (en) * 2019-09-16 2020-01-07 上海云从企业发展有限公司 Data processing method and device, machine readable medium and equipment
CN112906723A (en) * 2019-11-19 2021-06-04 北京京邦达贸易有限公司 Feature selection method and device
CN112906723B (en) * 2019-11-19 2024-01-16 北京京邦达贸易有限公司 Feature selection method and device
CN113222632A (en) * 2020-02-04 2021-08-06 北京京东振世信息技术有限公司 Object mining method and device
CN111861704A (en) * 2020-07-10 2020-10-30 深圳无域科技技术有限公司 Wind control feature generation method and system
CN112529477A (en) * 2020-12-29 2021-03-19 平安普惠企业管理有限公司 Credit evaluation variable screening method, device, computer equipment and storage medium
CN112862593A (en) * 2021-01-28 2021-05-28 深圳前海微众银行股份有限公司 Credit scoring card model training method, device, system and computer storage medium
CN112862593B (en) * 2021-01-28 2024-05-03 深圳前海微众银行股份有限公司 Credit scoring card model training method, device and system and computer storage medium
CN114595244B (en) * 2022-03-11 2023-10-17 抖音视界有限公司 Method and device for aggregating crash data, electronic equipment and storage medium
CN116012143A (en) * 2023-01-03 2023-04-25 睿智合创(北京)科技有限公司 Variable selection and parameter estimation method under case-division regression
CN116012143B (en) * 2023-01-03 2023-10-13 睿智合创(北京)科技有限公司 Variable selection and parameter estimation method under case-division regression

Similar Documents

Publication Publication Date Title
CN109087196A (en) Credit-graded approach, system, computer equipment and readable medium
CN107330445B (en) User attribute prediction method and device
JP7090936B2 (en) ESG-based corporate evaluation execution device and its operation method
Almana et al. A survey on data mining techniques in customer churn analysis for telecom industry
CN110223168A (en) A kind of anti-fraud detection method of label propagation and system based on business connection map
CN113935434A (en) Data analysis processing system and automatic modeling method
CN110008259A (en) The method and terminal device of visualized data analysis
CN106503863A (en) Based on the Forecasting Methodology of the age characteristicss of decision-tree model, system and terminal
CN105574544A (en) Data processing method and device
CN111199474A (en) Risk prediction method and device based on network diagram data of two parties and electronic equipment
CN111325619A (en) Credit card fraud detection model updating method and device based on joint learning
CN109495479A (en) A kind of user's abnormal behaviour recognition methods and device
CN107145516A (en) A kind of Text Clustering Method and system
CN110046889A (en) A kind of detection method, device and the server of abnormal behaviour main body
CN110084627A (en) The method and apparatus for predicting target variable
CN108228622A (en) The sorting technique and device of traffic issues
CN113239268B (en) Commodity recommendation method, device and system
CN107908616A (en) The method and apparatus of anticipation trend word
CN108846695A (en) The prediction technique and device of terminal replacement cycle
CN110727740B (en) Correlation analysis method and device, computer equipment and readable medium
CN107679209B (en) Classification expression generation method and device
CN112884569A (en) Credit assessment model training method, device and equipment
CN109102396A (en) A kind of user credit ranking method, computer equipment and readable medium
CN112016855A (en) User industry identification method and device based on relational network matching and electronic equipment
CN106919997A (en) A kind of customer consumption Forecasting Methodology of the ecommerce based on LDA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181225