CN110163743A - A kind of credit-graded approach based on hyperparameter optimization - Google Patents
A kind of credit-graded approach based on hyperparameter optimization Download PDFInfo
- Publication number
- CN110163743A CN110163743A CN201910347744.3A CN201910347744A CN110163743A CN 110163743 A CN110163743 A CN 110163743A CN 201910347744 A CN201910347744 A CN 201910347744A CN 110163743 A CN110163743 A CN 110163743A
- Authority
- CN
- China
- Prior art keywords
- credit
- function
- value
- evaluation function
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
Abstract
The present invention relates to the credit-graded approach based on hyperparameter optimization, step includes: S1, collects scoring main information data and to data prediction and feature selecting, makes training dataset and test data set;S2 establishes credit scoring model, chooses the modeling of XGBoost algorithm, and using Gaussian process combination Bayes to the hyperparameter optimization of algorithm;S3 chooses the fixed XGBoost algorithm of optimal hyper parameter group, uses training dataset training credit scoring model;S4 is predicted and is assessed to credit scoring model using test data set, by formula score=A-B*ln (p/ (1-p)), calculates credit scoring.The present invention is in hyperparameter optimization, when that can not determine objective function curve, assumed by guess, assertive goal function meets the Gaussian Profile of multivariable and to assuming the further modified process of assessment models, promote the efficiency and reliability of hyperparameter optimization, accelerate model formation efficiency, is conducive to corporate model and substitutes efficiency, promote risk control ability.
Description
Technical field
The present invention relates to computer credit scoring technology field more particularly to a kind of credit scorings based on hyperparameter optimization
Method.
Background technique
With the rapid development of internet credit industry, risk problem is also continued to bring out.It is asked by model to control risk
Topic has become numerous enterprises preferred option, and examination & approval efficiency can be greatly improved by carrying out credit evaluation using model, save manpower at
This.
In modeling process, the adjustment of model hyper parameter needs to consume the plenty of time, and the method for usual parameter optimization has network
Search method and Monte Carlo analysis.Network searching method can not be suitable for continuity parameter, once parameter combination scale increase, time
Meeting exponentially grade increases during going through, and takes considerable time, random search cannot utilize priori knowledge for next group of hyper parameter
Selection.
Summary of the invention
The purpose of the present invention is to provide a kind of credit-graded approaches based on hyperparameter optimization, it is intended to solve modeling process
Middle adjustment model hyper parameter needs the problem of consuming the plenty of time.
To achieve the above object, technical scheme is as follows:
A kind of credit-graded approach based on hyperparameter optimization, includes the following steps:
S1, collects scoring main information data and to data prediction and feature selecting, makes training dataset and test
Data set;
S2 chooses the modeling of XGBoost algorithm, it is assumed that evaluation function is Gaussian process function, using EI optimisation criteria, selection
Optimal hyper parameter group;
S3 selects optimal hyper parameter group and training dataset training credit scoring model;
S4 is predicted and is assessed to credit scoring model using test data set, passes through formula score=A-B*ln (p/ (1-
P)), credit scoring is calculated, wherein p is model prediction probability, and A, B are constant.
In step 1, scoring main information data include application information, carrier data, Unionpay portrait, main strategies,
Data prediction includes desensitization process and WOE coding, makes training dataset and test data set.
Hypothesis evaluation function isWherein, y is output valve, and f is Gaussian process function, and N is height
This distribution, f (cx) are prior probability model,For variance;For one group of data point x1:n={ x1,…,xn, it is assumed that evaluation letter
Several value f1:n={ f (x1),…,f(xn) described using Gaussian Profile, f1:n~N (m (x1:n), k), xnFor n-th group input to
Amount, x1:nThe input vector matrix of n-th group, f (x are arrived for the 1st groupn) it is xnCorresponding evaluation function value, f1:nFor x1To xnIt is corresponding
The set of evaluation function value.
According to formula Wherein, Φ (Z) is Cumulative Distribution Function, and φ (z) is probability-distribution function,For hyper parameter Gauss
Procedure function value, f (x) are Gaussian process functional value, and E is expectation function,μIt (x) is Gaussian process function x mean value, σ (x) is height
This procedure function x mean square deviation uses EI optimization process to give evaluation function input value x, updating using Gaussian process function
The posterior error of hypothesis evaluation function is estimated, obtains new input value x by maximizing EI, is counted again by new input value x
The output valve of the y of hypothesis evaluation function is calculated, fixed the number of iterations is repeated, until convergence.
Credit-graded approach based on hyperparameter optimization of the invention, by leading to when that can not determine objective function curve
Guess is crossed it is assumed that assertive goal function meets the Gaussian Profile of multivariable and to assuming the further modified process of assessment models,
The efficiency and reliability of hyperparameter optimization is promoted, model formation efficiency is accelerated, is conducive to corporate model and substitutes efficiency, promote wind
Dangerous control ability.
Detailed description of the invention
Fig. 1 is the flow chart of the credit-graded approach in one embodiment of the invention based on hyperparameter optimization.
Specific embodiment
Technical solution of the present invention is described in further detail with reference to the accompanying drawings and examples.
Credit-graded approach based on hyperparameter optimization of the invention, as shown in Figure 1, including the following steps:
S1, collects scoring main information data and to data prediction and feature selecting, makes training dataset and test
Data set;
S2 establishes credit scoring model, chooses the modeling of XGBoost algorithm, it is assumed that evaluation function is Gaussian process function, is adopted
With EI optimisation criteria, optimal hyper parameter group is selected;
S3 selects optimal hyper parameter group and training dataset training credit scoring model;
S4 is predicted and is assessed to credit scoring model using test data set, passes through formula score=A-B*ln (p/ (1-
P)), credit scoring is calculated, wherein p is model prediction probability, and A, B are constant.
Wherein, scoring main information data include application information, carrier data, Unionpay's portrait and main strategies, are led to
It crosses data information desensitization process and WOE is encoded and realized data prediction, in data that treated, 70% data are for training
Model, makes training dataset, and 30% data test model makes test data set.
Using Gaussian process combination Bayes to the hyperparameter optimization of algorithm method particularly includes: it is assumed that evaluation function is height
This procedure function it is expected to obtain multivariate Gaussian distributed model with covariance function by determining, then uses EI optimisation criteria, choosing
Select optimal hyper parameter group.
Hypothesis evaluation function isWherein y is evaluation function output valve, and f is Gaussian process letter
Number, N are Gaussian Profile, and f (cx) is prior probability model,For variance;For one group of data point x1:n={ x1,…,xn, it is false
Determine the value f of evaluation function1:n={ f (x1),…,f(xn), f is described using Gaussian Profile1:n~N (m (x1:n), k), xnIt is n-th
Group input vector, x1:nThe input vector matrix of n-th group, f (x are arrived for first groupn) it is xnCorresponding evaluation function value, f1:nFor x1It arrives
xnThe set of corresponding evaluation function value.
According to formula Wherein, x is input value,For current one group of optimal hyper parameter, Φ (Z) is Cumulative Distribution Function, φ
It (z) is probability-distribution function,For hyper parameter Gaussian process functional value, f (x) is Gaussian process functional value, and E is desired letter
Number, μ (x) are Gaussian process function x mean value, and σ (x) is Gaussian process function x mean square deviation, use EI optimization process to give accepted opinion
Valence function input value x is estimated using the posterior error that Gaussian process function updates hypothesis evaluation function, is obtained by maximizing EI
New x recalculates the output valve of the y of hypothesis evaluation function by new x, repeats fixed the number of iterations, until restraining
To optimal hyper parameter group.
Finally, verifying credit scoring model using test data set, evaluation index includes ROC curve, KS curve, obscures square
Battle array, and by formula score=A-B*ln (p/ (1-p)), calculate credit scoring, whereinpFor model prediction probability, A, B are
Constant.Constant A, B are by will be known to two or the score value assumed is brought into and is calculated in formula, it is generally the case that need to set two
It is a it is assumed that first is that score value is specifically expected to some specific ratio set, second is that determining the score be doubled of ratio.
In one embodiment, data dimension, which includes that credit dimension is the age, works, to be determined to the information data of scoring main body
Property and the credit card accrediting amount, Risk Dimensions are overdue label, overdue for bad sample, and not overdue preferably sample is good
Sample is denoted as 0, and bad sample is denoted as 1, is used for training pattern as model parameter.
Credit scoring model is established, XGBoost algorithm is chosen, using the model under logarithm loss appraisal different parameters
Can, the loss functionWherein, N is sample size, yiFor i-th of sample
This true value, piFor the predicted value of i-th of sample.Select the hyperparameter optimization being affected to algorithm, including min_
Child_weight, learning_rate, max_depth, n_estimators, setup parameter range min_child_
Weight:(1,5), learning_rate:(0.01,0.1) and, max_depth:(1,100), n_estimators:(10,
500)。
Gaussian process function is initialized, chooses initialization X value, such as x1 (1,0.01,1,10), x2 (1,0.03,2,12) makes
The posterior error value for assuming valuation functions is updated with Gaussian process function, then by EI optimisation criteria, calculates the x for maximizing EI
Value, calculates the value of Gaussian process function f, according to whether it is minimum or reach the number of iterations to reach the loss function, determines whether to meet
Target, if meet save x value, if conditions are not met, repeat update Gaussian process function, obtain best hyper parameter value, as x (1,
0.01,1,10).Finally using the parameter value in the fixed XGBoost of best hyper parameter value x (1,0.01,1,10), and use instruction
Practice data set training credit scoring model.
Credit-graded approach based on hyperparameter optimization of the invention, in hyperparameter optimization, it is assumed that evaluation function meets
Gaussian Profile chooses input value x by maximizing EI, calculates Gaussian process functional value, judge whether to meet target, constantly repeatedly
In generation, updates Gaussian process function, by when that can not determine objective function curve, by guess it is assumed that assertive goal function is full
The Gaussian Profile of sufficient multivariable and to assuming the further modified process of assessment models.
Technical solution of the present invention and beneficial effect is described in detail in embodiment described above, it should be understood that
Above is only a specific embodiment of the present invention, it is not intended to restrict the invention, it is all to be done in spirit of the invention
Any modification, supplementary, and equivalent replacement etc., should all be included in the protection scope of the present invention.
Claims (4)
1. a kind of credit-graded approach based on hyperparameter optimization, which comprises the steps of:
S1, collects scoring main information data and to data prediction and feature selecting, makes training dataset and test data
Collection;
S2 establishes credit scoring model, chooses the modeling of XGBoost algorithm, it is assumed that and hyper parameter evaluation function is Gaussian process function,
Using EI optimisation criteria, optimal hyper parameter group is selected;
S3 selects optimal hyper parameter group and training dataset training credit scoring model;
S4 is predicted and is assessed to credit scoring model using test data set, by formula score=A-B*ln (p/ (1-p)),
It calculates, credit scoring, wherein p is model prediction probability, and A, B are constant.
2. the credit-graded approach according to claim 1 based on hyperparameter optimization, it is characterised in that: in step 1, comment
Dividing main information data includes application information, carrier data, Unionpay's portrait, main strategies, and data prediction includes desensitization
Processing and WOE coding, feature selecting are that selection including selects the age for modeling from application information from all fields,
Educational background, carrier data such as network duration, a nearly month call detailed list.
3. the credit-graded approach according to claim 1 based on hyperparameter optimization, it is characterised in that: assuming that hyper parameter is commented
Valence function isWherein, y is the value of evaluation function, and f is Gaussian process function, and N is Gaussian Profile, f
It (cx) is prior probability model,For variance;For one group of data point x1:n={ x1,…,xn, it is assumed that the value f of evaluation function1:n
={ f (x1),…,f(xn), it is described using Gaussian Profile, f1:n~N (m (x1:n), k), xnFor n-th group input vector, x1:nIt is
1 group of input vector matrix to n-th group, f (xn) it is xnCorresponding evaluation function value, f1:nFor x1To xnCorresponding evaluation function value
Set.
4. the credit-graded approach according to claim 3 based on hyperparameter optimization, it is characterised in that: according to formulaWherein,For hyper parameter Gaussian process functional value, f (x) is Gaussian process functional value, and E is expectation function, and μ (x) is Gaussian process letter
Number x mean value, σ (x) are Gaussian process function x mean square deviation, and Φ (Z) is Cumulative Distribution Function, and φ (z) is probability-distribution function;It is logical
Given evaluation function input value x is crossed, is estimated using the posterior error that Gaussian process function updates hypothesis evaluation function, passes through maximum
Change EI and obtain new evaluation function input value x, recalculates the y's of hypothesis evaluation function by new evaluation function input value x
Output valve repeats fixed the number of iterations, until convergence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910347744.3A CN110163743A (en) | 2019-04-28 | 2019-04-28 | A kind of credit-graded approach based on hyperparameter optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910347744.3A CN110163743A (en) | 2019-04-28 | 2019-04-28 | A kind of credit-graded approach based on hyperparameter optimization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110163743A true CN110163743A (en) | 2019-08-23 |
Family
ID=67640152
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910347744.3A Withdrawn CN110163743A (en) | 2019-04-28 | 2019-04-28 | A kind of credit-graded approach based on hyperparameter optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110163743A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111553482A (en) * | 2020-04-09 | 2020-08-18 | 哈尔滨工业大学 | Method for adjusting and optimizing hyper-parameters of machine learning model |
CN111951097A (en) * | 2020-08-12 | 2020-11-17 | 深圳微众信用科技股份有限公司 | Enterprise credit risk assessment method, device, equipment and storage medium |
CN112734568A (en) * | 2021-01-29 | 2021-04-30 | 深圳前海微众银行股份有限公司 | Credit scoring card model construction method, device, equipment and readable storage medium |
CN113673174A (en) * | 2021-09-08 | 2021-11-19 | 中国平安人寿保险股份有限公司 | Hyper-parameter determination method, device, equipment and storage medium |
CN113793212A (en) * | 2021-09-24 | 2021-12-14 | 重庆富民银行股份有限公司 | Credit assessment method |
CN113919933A (en) * | 2021-08-25 | 2022-01-11 | 北京睿知图远科技有限公司 | Client scoring verification method based on quality label |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107798600A (en) * | 2017-12-05 | 2018-03-13 | 深圳信用宝金融服务有限公司 | The credit risk recognition methods of the small micro- loan of internet finance and device |
CN108154430A (en) * | 2017-12-28 | 2018-06-12 | 上海氪信信息技术有限公司 | A kind of credit scoring construction method based on machine learning and big data technology |
CN108596757A (en) * | 2018-04-23 | 2018-09-28 | 大连火眼征信管理有限公司 | A kind of personal credit file method and system of intelligences combination |
US20180322406A1 (en) * | 2017-05-04 | 2018-11-08 | Zestfinance, Inc. | Systems and methods for providing machine learning model explainability information |
-
2019
- 2019-04-28 CN CN201910347744.3A patent/CN110163743A/en not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180322406A1 (en) * | 2017-05-04 | 2018-11-08 | Zestfinance, Inc. | Systems and methods for providing machine learning model explainability information |
CN107798600A (en) * | 2017-12-05 | 2018-03-13 | 深圳信用宝金融服务有限公司 | The credit risk recognition methods of the small micro- loan of internet finance and device |
CN108154430A (en) * | 2017-12-28 | 2018-06-12 | 上海氪信信息技术有限公司 | A kind of credit scoring construction method based on machine learning and big data technology |
CN108596757A (en) * | 2018-04-23 | 2018-09-28 | 大连火眼征信管理有限公司 | A kind of personal credit file method and system of intelligences combination |
Non-Patent Citations (2)
Title |
---|
王重仁 等: "基于超参数优化和集成学习的互联网信贷个人信用评估", 《统计与决策》 * |
韩修龙: "基于XGBOOST的用户信用评分建模", 《电脑知识与技术》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111553482A (en) * | 2020-04-09 | 2020-08-18 | 哈尔滨工业大学 | Method for adjusting and optimizing hyper-parameters of machine learning model |
CN111553482B (en) * | 2020-04-09 | 2023-08-08 | 哈尔滨工业大学 | Machine learning model super-parameter tuning method |
CN111951097A (en) * | 2020-08-12 | 2020-11-17 | 深圳微众信用科技股份有限公司 | Enterprise credit risk assessment method, device, equipment and storage medium |
CN112734568A (en) * | 2021-01-29 | 2021-04-30 | 深圳前海微众银行股份有限公司 | Credit scoring card model construction method, device, equipment and readable storage medium |
CN112734568B (en) * | 2021-01-29 | 2024-01-12 | 深圳前海微众银行股份有限公司 | Credit scoring card model construction method, device, equipment and readable storage medium |
CN113919933A (en) * | 2021-08-25 | 2022-01-11 | 北京睿知图远科技有限公司 | Client scoring verification method based on quality label |
CN113673174A (en) * | 2021-09-08 | 2021-11-19 | 中国平安人寿保险股份有限公司 | Hyper-parameter determination method, device, equipment and storage medium |
CN113673174B (en) * | 2021-09-08 | 2023-07-25 | 中国平安人寿保险股份有限公司 | Super parameter determination method, device, equipment and storage medium |
CN113793212A (en) * | 2021-09-24 | 2021-12-14 | 重庆富民银行股份有限公司 | Credit assessment method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110163743A (en) | A kind of credit-graded approach based on hyperparameter optimization | |
Sormin et al. | Predictions of World Population Life Expectancy Using Cyclical Order Weight/Bias | |
Oduguwa et al. | Bi-level optimisation using genetic algorithm | |
Cheng et al. | Accurately predicting building energy performance using evolutionary multivariate adaptive regression splines | |
CN109408823B (en) | A kind of specific objective sentiment analysis method based on multi-channel model | |
CN110321361A (en) | Examination question based on improved LSTM neural network model recommends determination method | |
CN104102917A (en) | Construction method of domain self-adaptive classifier, construction device for domain self-adaptive classifier, data classification method and data classification device | |
CN108038538A (en) | Multi-objective Evolutionary Algorithm based on intensified learning | |
CN110533150A (en) | Self -adaptive and reuse system and method based on Support vector regression model | |
Lin et al. | Tourism demand forecasting: Econometric model based on multivariate adaptive regression splines, artificial neural network and support vector regression | |
CN110111606A (en) | A kind of vessel traffic flow prediction technique based on EEMD-IAGA-BP neural network | |
Su et al. | Cabin placement layout optimisation based on systematic layout planning and genetic algorithm | |
Król et al. | Investigation of evolutionary optimization methods of TSK fuzzy model for real estate appraisal | |
CN106156857A (en) | The method and apparatus selected for mixed model | |
CN105740949A (en) | Group global optimization method based on randomness best strategy | |
CN114004153A (en) | Penetration depth prediction method based on multi-source data fusion | |
CN109214500A (en) | A kind of transformer fault recognition methods based on integrated intelligent algorithm | |
CN102955946A (en) | Two-stage fast classifier based on linear classification tree and neural network | |
Maulana et al. | Crude Oil Price Forecasting Using Long Short-Term Memory | |
Weise et al. | An improved generic bet-and-run strategy with performance prediction for stochastic local search | |
Xian | A new fuzzy comprehensive evaluation model based on the support vector machine | |
CN106886648A (en) | A kind of three-element vector synthesis control optimization method | |
Dawid et al. | Genetic algorithms | |
Ayati et al. | Multiobjective wrapper sampling design for leak detection of pipe networks based on machine learning and transient methods | |
Campigotto et al. | Adapting to a realistic decision maker: experiments towards a reactive multi-objective optimizer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190823 |
|
WW01 | Invention patent application withdrawn after publication |